Interactions at Scale for LLMs: New SPEX Framework Reveals AI Decisions

SPEX and ProxySPEX: 2026 Breakthrough to Identify Interactions at Scale in LLMs

Identifying interactions at scale for LLMs has long been a bottleneck in AI interpretability research. Traditional methods like SHAP and LIME struggle to capture high-order dependencies among features, training data, or model components—especially as context lengths and model sizes explode. Now, researchers from UC Berkeley’s BAIR lab have introduced SPEX and ProxySPEX, two revolutionary algorithms that decode the hidden synergies driving LLM behavior with orders-of-magnitude greater efficiency. According to the BAIR blog, these tools shift the paradigm from isolated feature attribution to systemic interaction discovery, unlocking unprecedented transparency in complex models.

How SPEX Works: From Combinatorial Explosion to Linear Efficiency

SPEX (Spectral Explainer) leverages principles from signal processing and coding theory to solve the combinatorial explosion problem in interaction discovery. Instead of testing every possible combination of input features, training examples, or attention heads, SPEX uses strategically designed ablations that encode multiple interactions simultaneously. Efficient decoding algorithms then isolate the sparse, low-degree interactions that truly influence outcomes—reducing computational costs from exponential to linear scales.

ProxySPEX: Scaling Efficiency with Hierarchical Insight

ProxySPEX builds on this by introducing a hierarchy-aware structure: if a complex interaction (e.g., between four words) is influential, its subsets are likely relevant too. This insight allows ProxySPEX to achieve SPEX-level accuracy with up to 10x fewer ablations. In one striking test on GPT-4o mini, standard SHAP wrongly flagged the word "trolley" as the primary cause of moral reasoning failure. SPEX revealed the true culprit: a synergistic interaction between "trolley," "pulling," "lever," and a second instance of "trolley." Replacing all four terms with synonyms restored near-perfect accuracy—something no prior method could detect.

Real-World Applications Beyond Text: Data, Vision, and Mechanistic Interpretability

The frameworks extend beyond text. In data attribution, ProxySPEX identified synergistic training examples in CIFAR-10 that collectively defined decision boundaries—such as the combination of a sports car, truck, and delivery van shaping a model’s perception of "automobile." Redundant examples, like clusters of similar dog images reinforcing a "horse" label, were also flagged, enabling smarter dataset pruning. In mechanistic interpretability, ProxySPEX uncovered that early transformer layers operate linearly, while later layers rely heavily on intra-layer attention head interactions—a finding that enabled task-specific pruning which actually improved MMLU performance.

Why This Changes Everything: From Post-Hoc to Causal AI Understanding

These breakthroughs represent a critical evolution beyond conventional Explainable AI (XAI). As highlighted in a March 2026 arXiv paper, the field is moving from post-hoc explanations to causal, structural understanding of model internals. SPEX and ProxySPEX are not just tools—they are the foundation for a new science of AI behavior, where interactions, not just weights or gradients, become the unit of analysis.

How to Use SPEX and ProxySPEX Today

With code now integrated into the SHAP-IQ repository, the research community can replicate and extend these methods across domains—from healthcare diagnostics to genomics. Identifying interactions at scale for LLMs is no longer a theoretical challenge; it’s an operational reality. As models grow more powerful, so too must our ability to understand them—and SPEX delivers that capability at the scale required for real-world trust and safety.

AI-Powered Content

Sources: arXiv:2602.24176 • BAIR Blog: SPEX & ProxySPEX • SHAP-IQ: Efficient Feature Attribution