Artificial intelligence agents are getting better and better at writing code, and can also hack

americanftwr June 25, 2025

0 0 2 minutes read

Artificial intelligence agents are getting better and better at writing code, and can also hack

up to date Not only are artificial intelligence models excellent in software engineering, but new research shows that they are looking for errors in software are getting bigger and bigger.

AI researchers at the University of California, Berkeley tested the latest AI models and agents found vulnerabilities in 188 large open source code bases. The AI model uses a new benchmark called Cybergym, identifying 17 new errors, including 15 previously unknown or “zero-days.” “Many of these vulnerabilities are crucial,” said Dawn Song, a professor at the University of California, Berkeley who directed the work.

Many experts expect AI models to be a powerful cybersecurity weapon. Currently, the AI tool that launches Xbow has raised the bug hunting board rankings in Hackerone rankings and is currently at the highest level. The company recently announced $75 million in new funding.

Song said the coding skills of the latest AI models and improved reasoning capabilities are beginning to change the cybersecurity situation. “It’s a crucial moment,” she said. “It’s actually beyond our general expectations.”

As models continue to improve, they will automate the process of discovering and exploiting security flaws. This can help companies ensure their software is secure, but it can also help hackers break into the system. “We’re not even trying as hard as we try,” Song said. “If we raise the budget, we can get the agents running longer, they can do better.”

The UC Berkeley team tested traditional border AI models from OpenAI, Google and Anthropic, as well as open source products from Meta, DeepSeek and Alibaba, and combined several agents for finding bugs, including OpenHands, Cybench and Enigma.

The researchers used descriptions of known software vulnerabilities in 188 software projects. They then provide descriptions to cybersecurity agents powered by Frontier AI models to see if they can identify their flaws by analyzing new code bases, running tests, and developing concept proof-of-concept utilization. The team also asked the agents to look for new vulnerabilities in the code base themselves.

Through this process, AI tools generate hundreds of proof-of-concept exploits, in which researchers identified 15 previously unseen vulnerabilities and two previously disclosed and patched vulnerabilities. This work adds to growing evidence that AI can automate the discovery of zero-day vulnerabilities, which can be dangerous (and valuable) because they may provide a way to crack on on-site systems.

Still, AI seems destined to be an important part of the cybersecurity industry. Security expert Sean Heelan recently discovered zero-day flaws in the widely used Linux kernel with the help of OpenAI reasoning O3. Last November, Google announced that it discovered a previously unknown software vulnerability through AI through a program called Zero.

Like the rest of the software industry, many cybersecurity companies are fascinated by the potential of AI. New work does show that AI can often spot new flaws, but it also highlights the remaining limitations of the technology. AI systems cannot find most defects and are troubled by particularly complex defects.