Web2BigTable: Bi-Level Multi-Agent System for Web-to-Table Search

Web2BigTable Hits 38.50 Avg@4: Bi-Level AI Beats All Web-to-Table Models in 2026

Web2BigTable introduces a groundbreaking bi-level multi-agent system that transforms web-to-table search by simultaneously tackling breadth and depth challenges. It outperforms prior systems by over 7x on the WideSearch benchmark, setting new standards for structured information extraction.

summarize3-Point Summary

1Web2BigTable introduces a groundbreaking bi-level multi-agent system that transforms web-to-table search by simultaneously tackling breadth and depth challenges. It outperforms prior systems by over 7x on the WideSearch benchmark, setting new standards for structured information extraction.

2Web2BigTable Hits 38.50 Avg@4: Bi-Level AI Beats All Web-to-Table Models in 2026 Web2BigTable, a groundbreaking bi-level multi-agent LLM system, redefines web-to-table search by achieving a record-breaking Avg@4 Success Rate of 38.50 on the WideSearch benchmark—more than seven times higher than any prior model.

3Designed to bridge the gap between breadth and depth in structured data extraction, it transforms chaotic web content into accurate, schema-aligned tables using advanced LLM orchestration.

Web2BigTable Hits 38.50 Avg@4: Bi-Level AI Beats All Web-to-Table Models in 2026

Web2BigTable, a groundbreaking bi-level multi-agent LLM system, redefines web-to-table search by achieving a record-breaking Avg@4 Success Rate of 38.50 on the WideSearch benchmark—more than seven times higher than any prior model. Designed to bridge the gap between breadth and depth in structured data extraction, it transforms chaotic web content into accurate, schema-aligned tables using advanced LLM orchestration.

How the Bi-Level Architecture Works

Web2BigTable employs a dual-layered agent framework: an upper-level orchestrator breaks complex queries into parallel subtasks, while lower-level worker agents execute them concurrently. Unlike single-agent systems, it uses a closed-loop run–verify–reflect cycle, leveraging persistent external memory to adapt strategies in real time. This enables each agent to learn from past failures and successes, creating a self-improving information retrieval ecosystem.

Performance on WideSearch Benchmark

On the WideSearch benchmark, Web2BigTable achieves a Row F1 of 63.53 and an Item F1 of 80.12—outperforming the second-best system by 25.03 and 14.42 points respectively. Its success stems from dynamic task decomposition using 11 specialized skills, routed intelligently by a trained task-router. The shared workspace allows agents to exchange partial findings, eliminating redundancy and resolving conflicts without human intervention.

LLM Orchestration in Practice

At the core of Web2BigTable is advanced LLM orchestration powered by the Model Context Protocol (MCP) and built on LangChain. Each agent operates with context-aware memory, enabling coherent, schema-driven extraction across heterogeneous sources. This architecture supports not just web tables, but also enterprise knowledge graphs, financial data pipelines, and academic literature synthesis—all requiring consistent, structured outputs.

Why It Outperforms WideSeek and Others

Unlike WideSeek, which scales agent count for breadth, Web2BigTable prioritizes structural coherence. It integrates insights from foundational research like Cafarella’s WebTables (SIGMOD 2008), treating the web as a distributed database. On XBench-DeepSearch, it achieves 73.0% accuracy—proving its strength in both depth and scale. This dual capability makes it the first system to master end-to-end web-to-table search without sacrificing precision.

Open-sourced on GitHub under Memento-Teams, Web2BigTable is built for real-world deployment. Its modular design allows seamless extension to domains requiring reliable structured data extraction—from e-commerce product aggregation to regulatory compliance reporting. As the web grows more fragmented, Web2BigTable doesn’t just retrieve data—it structures it intelligently, turning unstructured chaos into queryable knowledge.

AI-Powered Content

Sources: GitHub: Web2BigTable • WebTables: 154M HTML Tables (SciSpace) • Cafarella et al., SIGMOD 2008 • Web2BigTable Paper (arXiv) • Hugging Face Model Card