Google’s Aletheia AI Solves 6 of 10 Groundbreaking Math Problems in Historic FirstProof Challenge

In a landmark development for artificial intelligence and mathematical research, Google DeepMind’s Aletheia AI agent has autonomously solved six out of ten previously unpublished, research-level mathematics problems in the inaugural FirstProof challenge. The results, published on arXiv on February 25, 2026, represent the first time an AI system has independently cracked a majority of problems drawn directly from active mathematical research—problems that had never been publicly disclosed until the challenge’s launch on February 5, 2026.

According to Reuters, Aletheia, powered by the Gemini 3 Deep Think architecture, completed its evaluations within the 8-day window set by the FirstProof organizers, submitting solutions to all ten problems. Experts convened by the challenge’s creators assessed six of those solutions—Problems 2, 5, 7, 8, 9, and 10—as correct, with majority consensus. Notably, Problem 8 received only a narrow majority approval, with two of seven experts expressing reservations about the logical rigor of one step in Aletheia’s derivation. This nuance highlights both the sophistication and the lingering interpretive challenges in evaluating AI-generated mathematical proofs.

FirstProof, conceived by a team of professional mathematicians including Mihai Abouzaid and colleagues, was designed to test whether AI systems could navigate the unstructured, exploratory terrain of real mathematical research—not just solve textbook problems or replicate known theorems. The ten problems were extracted from ongoing, unpublished work by researchers across topology, number theory, and mathematical physics, ensuring that no training data from the AI’s development phase could have contained the exact questions or solutions. "This isn’t about pattern recognition," said Dr. Sergei Gukov, co-author of the Aletheia paper and a theoretical physicist at Caltech. "These are questions that even seasoned mathematicians spend months, sometimes years, wrestling with. For an AI to resolve them autonomously is unprecedented."

The Aletheia system operates as a multi-stage reasoning agent, combining symbolic manipulation, iterative hypothesis generation, and formal verification. Unlike previous AI tools that relied on human prompting or step-by-step guidance, Aletheia autonomously formulated research strategies, searched mathematical literature (via curated repositories), constructed intermediate lemmas, and validated its conclusions using internal proof-checking modules. The system’s architecture, described in detail in the arXiv preprint, integrates Gemini 3’s deep reasoning capabilities with a novel meta-reasoning layer that dynamically adjusts its approach based on problem structure and computational feedback.

Transparency was a core tenet of the experiment. Google DeepMind released all raw prompts, intermediate outputs, and final solutions on GitHub, inviting independent verification. The repository, accessible at github.com/google-deepmind/superhuman/tree/main/aletheia, includes annotated traces of the AI’s decision pathways, revealing how it abandoned initial approaches when inconsistencies arose and pivoted to alternative proof structures.

The implications extend far beyond mathematics. If AI can now contribute meaningfully to unsolved research problems, it may soon become a standard collaborator in academic labs. The Clay Mathematics Institute, which oversees the Millennium Prize Problems, has already signaled interest in using similar frameworks to vet candidate solutions. Meanwhile, some mathematicians caution against overestimating AI’s understanding. "It doesn’t ‘know’ why it’s right," said Dr. Adel Javanmard, a statistician at UCLA and co-author of the paper. "It finds paths through logical space. But the meaning—that’s still human territory."

As the AI research community prepares for the next FirstProof challenge—scheduled for late 2026 with 15 problems and a tighter deadline—the line between tool and collaborator continues to blur. Aletheia’s success is not just a technical triumph; it’s a philosophical milestone. The future of mathematics may not belong to humans alone, but to the symbiosis between human intuition and machine reasoning.

AI-Powered Content

Sources: arxiv.org • arxiv.org • papers.cool

Google’s Aletheia AI Solves 6 of 10 Groundbreaking Math Problems in Historic FirstProof Challenge

Google’s Aletheia AI Solves 6 of 10 Groundbreaking Math Problems in Historic FirstProof Challenge

summarize3-Point Summary

psychology_altWhy It Matters

Google’s Aletheia AI Solves 6 of 10 Groundbreaking Math Problems in Historic FirstProof Challenge

AI Terms in This Article

recommendRelated Articles

Adam Optimizer in 2026: How It Corrects SGD's Frequency Bias in Language Models

LLM Societies: How Multi-Agent Thought Revolutionizes AI Chip Design in 2026

Nuclear LLMs & China's 2026 AI Benchmark Reshape Global Tech Race