Google DeepMind created an artificial intelligence (AI) agent that fixed 72 security flaws in open-source software during six months of testing, including code with millions of lines.
CodeMender finds vulnerabilities and writes patches to fix them. The agent would have prevented a 2023 iPhone attack if it had existed earlier, according to researchers who announced the tool Monday.
Some projects the agent patched contain up to 4.5 million lines of code. The system handled everything from simple bugs to complex security flaws.
“Software vulnerabilities are notoriously difficult and time-consuming for developers to find and fix, even with traditional, automated methods like fuzzing,” said Raluca Ada Popa, senior staff research scientist at Google DeepMind, and Fionn Flynn, VP of Security and Privacy, in a blog post.
The agent works two ways. It patches newly discovered vulnerabilities as they appear and rewrites existing code to prevent future security problems.
DeepMind tested CodeMender on libwebp, an image compression library used by millions. The agent added security features that would have blocked CVE-2023-4863, a vulnerability hackers used to attack iPhones in 2023. That attack lets criminals control phones without users having to click anything.
Google’s Gemini Deep Think models power the system. CodeMender uses debugging tools to understand problems and create solutions.
Every proposed fix goes through multiple checks. A built-in validation system acts like a quality control inspector. This “LLM judge,” as DeepMind calls it, examines the differences between original and modified code to verify changes won’t break anything else in the software.
“As we achieve more breakthroughs in AI-powered vulnerability discovery, it will become increasingly difficult for humans alone to keep up,” the researchers wrote.
Human researchers review all patches before submission. Google maintains this review process to ensure reliability during development.
“While large language models are rapidly improving, mistakes in code security could be costly,” the researchers noted.
DeepMind’s other AI tools, like Google’s Big Sleep and OSS-Fuzz, find vulnerabilities faster than humans can fix them. CodeMender addresses this gap by automating the repair process.
Several critical open-source libraries have already accepted CodeMender’s patches. These fixes now protect software that millions use every day.
The company plans to release CodeMender as a public tool for developers. Technical papers explaining the system will follow in the coming months.
DeepMind continues testing the agent with open-source maintainers. Feedback from these partnerships will shape the final public release.