OpenAI Launches EVMbench to Evaluate AI’s Ability to Secure Smart Contracts
OpenAI, in collaboration with Paradigm and Ottersec, has unveiled EVMbench — a groundbreaking benchmark designed to test AI agents on their ability to detect, patch, and exploit vulnerabilities in Ethereum smart contracts. The public release marks a pivotal step in the race to harness AI for cybersecurity defense.

OpenAI Launches EVMbench to Evaluate AI’s Ability to Secure Smart Contracts
OpenAI, in partnership with blockchain security firm Paradigm and audit collective Ottersec, has introduced EVMbench, the first comprehensive benchmark to measure the cyber capabilities of artificial intelligence agents in securing Ethereum Virtual Machine (EVM)-based smart contracts. Announced on February 18, 2026, the initiative represents a strategic pivot from AI-driven code generation to AI-assisted cybersecurity, aiming to counter the rising tide of smart contract exploits that have cost the crypto industry over $3 billion in 2025 alone.
EVMbench evaluates AI models across three critical modes: Detect, Patch, and Exploit. Using a sandboxed blockchain environment, the benchmark subjects AI agents to 120 real-world vulnerabilities sourced from 40 independent smart contract audits. These include high-severity flaws such as reentrancy attacks, integer overflows, and logic errors that have historically led to catastrophic losses — including the infamous DAO hack and the Poly Network breach.
According to OpenAI’s technical paper, the latest iteration of its coding model, GPT-5.3-Codex, achieved a 72.2% success rate in exploit mode — a dramatic leap from the 31.9% performance of its predecessor, GPT-5, released just six months prior. This surge indicates that AI agents are rapidly evolving into potent offensive tools, capable of autonomously identifying and weaponizing subtle flaws in complex smart contract code. However, performance in detection and patching remains incomplete, with current models struggling to contextualize intent, assess risk severity accurately, or generate production-ready fixes without human oversight.
"EVMbench isn’t just about measuring AI’s hacking prowess — it’s about forcing the industry to confront the dual-use nature of these technologies," said a senior OpenAI researcher, speaking anonymously under the company’s media policy. "If AI can exploit vulnerabilities faster than humans can patch them, we need defensive AI that can outpace attackers. This benchmark is our call to action."
The benchmark’s public release — including datasets, evaluation scripts, and the sandboxed EVM environment — is a deliberate move to democratize research and accelerate the development of AI-powered security tools. By opening access to real-world vulnerability data, OpenAI invites academic institutions, blockchain startups, and security firms to build and test their own AI agents, fostering a competitive ecosystem for defensive innovation.
Paradigm, known for its deep expertise in crypto infrastructure and its investments in blockchain security startups, emphasized the benchmark’s role in shaping future audit practices. "Traditional manual audits are too slow and inconsistent," said a Paradigm spokesperson. "AI-assisted auditing, guided by benchmarks like EVMbench, could reduce audit cycles from weeks to hours while improving coverage and precision."
Industry reactions have been mixed. While security firms like Ottersec applaud the initiative as a necessary step toward AI-augmented defense, some crypto purists warn of unintended consequences. "We’re building a world where AI can both protect and destroy the same codebase," noted Ethereum developer and security researcher Lena Torres. "The real challenge isn’t technical — it’s ethical and governance-based. Who’s liable when an AI patches a contract and introduces a new bug?"
OpenAI has not disclosed whether EVMbench will be integrated into its commercial products, such as ChatGPT Enterprise or its developer APIs. However, the company confirmed that the benchmark’s architecture is extensible and could be adapted to other blockchain environments, including Solana and Cosmos SDK chains.
As AI agents grow more capable, EVMbench may become the de facto standard for evaluating next-generation cybersecurity tools — not just in crypto, but across all software ecosystems reliant on decentralized logic. The race is no longer just to build smarter AI. It’s to build safer ones.
Source: OpenAI EVMbench Technical Paper (2026), CoinDesk report, Reddit r/singularity announcement


