
OpenAI Collaborates with Paradigm to Launch EVMbench, Testing AI Agents’ Defense and Attack Capabilities in EVM Contracts, Revealing Strengths and Weaknesses.
Focusing on Real-World Economic Environment Testing, OpenAI and Paradigm Enhance On-Chain Security Ratings
Leading AI company OpenAI announced a partnership with well-known cryptocurrency venture capital firm Paradigm and security firm OtterSec to launch EVMbench, a benchmark tool designed to evaluate the security performance of AI agents in Ethereum Virtual Machine (EVM) smart contracts.
As AI and blockchain technologies converge deeply, smart contracts have become the core infrastructure managing over $100 billion in open-source crypto assets. The release of this tool signifies that the industry is beginning to recognize AI’s practical capabilities within economically meaningful environments.
OpenAI team notes that with the rapid advancement of AI agents in coding and planning, these models will play transformative roles in blockchain attack and defense in the future. Therefore, establishing a standardized evaluation framework is crucial for monitoring AI progress.
Three Deep Testing Modes with 120 Real Audit Vulnerabilities as the Benchmark
EVMbench’s core design centers around 120 high-risk vulnerabilities extracted from 40 professional audit reports. Data sources include well-known public audit competitions like Code4rena, ensuring testing scenarios closely resemble real-world complexity. The benchmark evaluates AI agents in three different operational modes:

Image source: OpenAI EVMbench core design evaluates AI agents in three different modes
- The first is “Detection Mode,” where AI audits contract codebases and identifies known vulnerabilities, assigning scores based on the severity of issues found;
- The second is “Patch Mode,” challenging AI to remove exploitable vulnerabilities and repair code without altering existing functionality;
- The final, highly controversial mode is “Exploit Mode,” where AI must execute end-to-end fund theft attacks within sandboxed blockchain environments.
To ensure rigorous and repeatable testing, the team developed a Rust-based testing framework that uses deterministic transaction replay techniques to verify whether AI’s attacks or patches succeed.
Significant Trend of Attack-Strength, Defense-Weakness; GPT-5.3-Codex Shows Remarkable Growth in Attacks
Initial test results reveal a clear performance gap across different tasks. The latest GPT-5.3-Codex performs exceptionally well in Exploit Mode, scoring as high as 72.2%, a dramatic improvement compared to GPT-5, released just six months earlier, which scored 31.9%.

Image source: Overview of scores for various AI models across three modes
This indicates that when the goal is explicitly “draining funds,” AI demonstrates strong iterative planning and execution capabilities. However, on the defense side, performance is comparatively weaker. AI often stops searching after discovering a single flaw in detection mode, and struggles to perfectly patch complex logic without affecting normal contract operation. Security experts express concern that AI could significantly shorten the time from vulnerability discovery to attack development, raising the bar for DeFi project defenses.
Talent Acquisition and Defense Funding, OpenAI’s Strategy for AI Agent Ecosystem Security
Beyond tool development, OpenAI is actively investing in talent and ecosystem defense. Recently, it hired Peter Steinberger, founder of the open-source AI agent project OpenClaw, to lead the development of next-generation personalized agents, transforming the project into an OpenAI-supported foundation model.
To address potential cybersecurity risks posed by AI, OpenAI commits to a $10 million API budget through its cybersecurity grant program to support open-source defense tools and critical infrastructure research. This move is particularly timely following the recent Moonwell protocol incident, where a coding error in AI-generated code caused approximately $1.78 million in losses.
Further Reading
Refusing Meta’s Billion-Dollar Offer, OpenClaw Creator Joins OpenAI in Talent Race; Is Vibe Coding to Blame? Moonwell Oracle Fails, Who Will Cover the $1.78M Loss?
Looking ahead, as more AI-assisted stablecoin payment agents and automated wallets join the ecosystem, the ability to distinguish models that merely describe vulnerabilities from those that can reliably provide defense solutions using tools like EVMbench will become a critical turning point in blockchain security.
Disclaimer: The information on this page may come from third parties and does not represent the views or opinions of Gate. The content displayed on this page is for reference only and does not constitute any financial, investment, or legal advice. Gate does not guarantee the accuracy or completeness of the information and shall not be liable for any losses arising from the use of this information. Virtual asset investments carry high risks and are subject to significant price volatility. You may lose all of your invested principal. Please fully understand the relevant risks and make prudent decisions based on your own financial situation and risk tolerance. For details, please refer to
Disclaimer.
Related Articles
Incentiv launches OG badge, open for claiming until March 16
Gate News Announcement, March 11, -L1 blockchain project Incentiv announced the opening of the "OG Badge" claim event for early users, with the deadline set for March 16. Users holding this badge will receive exclusive ecosystem benefits and other privileges.
GateNews2m ago
BlackRock bullish on tokenized assets: $69 trillion in U.S. stocks could go on-chain, with platforms like Ondo leading the new financial structure
Research firm Castle Labs指出,approximately $69 trillion of the U.S. stock market is becoming an important testing ground for digital asset development, with tokenized stocks moving from experimental phase to market application. Blockchain technology is enhancing trading methods and participation models, and traditional financial giants are beginning to pay attention to this trend, believing that a unified blockchain network will improve financial efficiency. Several projects such as Ondo, xStocks, and Hyperliquid are playing key roles in promoting stock tokenization, demonstrating the future growth potential of this market.
GateNews14m ago
Sonic Labs conducts on-chain token integration activities that do not involve the sale of tokens.
Sonic Labs issued a statement on March 11, stating that they are conducting on-chain token integration activities involving large transactions and fund transfers, emphasizing that this transaction does not involve token sales and that user funds are secure. This activity is part of the cleanup phase after the dual-network migration.
GateNews22m ago
RIVER (River) increased by 32.12% in the past 24 hours
Gate News Report, March 11 — According to Gate Market Data, at the time of press, RIVER (River) is trading at $14.80, up 32.12% in the past 24 hours. The price reached a high of $21.59 and a low of $10.67. The 24-hour trading volume is $51.1 million. The current market capitalization is approximately $290 million, an increase of $70.5 million from yesterday.
## Important Recent News about RIVER:
1️⃣ **On-Chain Infrastructure Financing Drives Growth**
TRON DAO Ventures has provided $8 million in funding to River to support the expansion of its on-chain abstraction infrastructure within the TRON ecosystem. This move indicates that River’s strategic value as a cross-chain liquidity connection solution has been recognized by leading public chain ecosystems, supporting the project’s long-term development.
GateNews38m ago
NFT project Doodles will launch the Doodles AI beta on March 17.
Gate News Announcement: On March 11, the NFT project Doodles officially announced that the Doodles AI beta version will be officially launched on March 17.
GateNews38m ago
Ripple accelerates Asia-Pacific expansion with Australian license application, XRP retail demand heats up, ETF capital structure draws attention
Ripple plans to apply for a financial services license in Australia through the acquisition of BC Payments to expand its cross-border payment business. Fiona Murray stated that the Australian market is important, and obtaining the license will enhance business capabilities. Ripple already holds 75 licenses worldwide and predicts that the Asia-Pacific payment transaction volume will double. Meanwhile, retail investors dominate the demand in the XRP market, ETF asset management scale remains stable, and institutional investors participate less, forming a "dual-track development" pattern between institutions and retail investors.
GateNews45m ago