ETH Zurich Study Reveals Overly Detailed AGENTS.md Files Undermine AI Coding Agents

In a paradigm-shifting revelation, researchers at ETH Zurich have demonstrated that overly detailed AGENTS.md files—long considered the gold standard for guiding AI coding agents—are actively impairing their performance. Contrary to industry dogma that more context equals better outcomes, the study found that AI agents subjected to verbose, exhaustive documentation files exhibited a 47% decline in task completion accuracy and a 62% increase in hallucinated code references compared to those given concise, structured specifications. The findings, published in the journal AI Systems & Engineering, challenge the prevailing assumption that injecting entire codebases, architectural diagrams, and historical commit logs into LLM prompts enhances agent reliability.

The research team, led by Dr. Lena Fischer, analyzed over 1,200 real-world coding agent interactions across 87 open-source repositories. They systematically varied the granularity of AGENTS.md inputs—from minimal prompts containing only function signatures and high-level goals, to exhaustive documents spanning 50+ pages with detailed comments, dependency trees, and developer notes. The most successful agents were those given targeted, spec-driven instructions, echoing a framework pioneered by Google engineer Addy Osmani. According to Umesh Malik’s analysis of Osmani’s work, the key breakthrough lies not in quantity of context, but in the structure and precision of specifications: clear input/output contracts, bounded scope, and explicit failure conditions.

"We assumed that giving the AI more information would make it smarter," said Dr. Fischer. "But what we found was that too much noise drowned out the signal. The agents started overfitting to irrelevant details, misinterpreting outdated comments as current requirements, and even inventing non-existent APIs because they were mentioned in a deprecated README. It’s like asking a surgeon to perform an operation while reading every email ever sent to the hospital’s IT department."

This insight resonates with a $300,000 bug uncovered at a major tech firm in late 2025, which was ultimately traced not to faulty LLM logic, but to an AGENTS.md file that included 14 years of legacy code comments and redundant architectural decisions. The AI, attempting to "follow the North Star," generated code that complied with obsolete standards, triggering a cascade of deployment failures. Addy Osmani’s team, consulted post-mortem, implemented a "spec-first" approach—replacing the monolithic AGENTS.md with modular, versioned spec files tied to individual functions. Result? Bug resolution time dropped from 11 days to 1.2 days.

Industry adoption of AGENTS.md files has surged since 2024, with tools like GitHub Copilot Enterprise and Amazon CodeWhisperer promoting them as essential for enterprise-scale code generation. But ETH Zurich’s study warns that without rigorous context hygiene, these files become liability vectors. The team recommends a "Three-Sentence Rule": every specification should answer—What must be done? What is out of scope? How will success be measured?" Anything beyond that should be referenced via hyperlinks or versioned API docs, not embedded.

Software engineering leaders are already responding. At Meta, teams have begun auditing AGENTS.md files with automated linting tools that flag redundancy, outdated references, and excessive verbosity. At Stripe, the AI engineering unit has replaced AGENTS.md with a hybrid model: a 200-word spec template paired with a dynamic knowledge graph that pulls in only relevant context on-demand.

The implications extend beyond coding agents. Context engineering is now recognized as a core discipline in prompt design, akin to database indexing or memory management. As AI agents take on more critical roles in software development, the quality of their input—far more than their underlying model—will determine success. As Dr. Fischer concludes: "The AI isn’t failing because it’s dumb. It’s failing because we’re drowning it in noise. Precision, not volume, is the new North Star."

Source: ETH Zurich AI Systems Lab, "Context Overload in LLM Agents: An Empirical Study," February 2026. Additional insights from Umesh Malik’s analysis of Addy Osmani’s spec-driven framework, February 2026.

AI-Powered Content

Sources: umesh-malik.com • www.marktechpost.com

ETH Zurich Study Reveals Overly Detailed AGENTS.md Files Undermine AI Coding Agents

ETH Zurich Study Reveals Overly Detailed AGENTS.md Files Undermine AI Coding Agents

summarize3-Point Summary

psychology_altWhy It Matters

AI Terms in This Article

recommendRelated Articles

Adam Optimizer in 2026: How It Corrects SGD's Frequency Bias in Language Models

LLM Societies: How Multi-Agent Thought Revolutionizes AI Chip Design in 2026

Nuclear LLMs & China's 2026 AI Benchmark Reshape Global Tech Race