GLM-5 Surpasses Kimi K2.5 as Top Open-Weights Model on NYT Connections Benchmark
GLM-5, the latest open-weights AI model from ZAI, has achieved a record score of 81.8 on the Extended NYT Connections benchmark, outperforming Kimi K2.5 Thinking. The breakthrough underscores advances in agentic reasoning and sparse architecture design.

GLM-5 Surpasses Kimi K2.5 as Top Open-Weights Model on NYT Connections Benchmark
summarize3-Point Summary
- 1GLM-5, the latest open-weights AI model from ZAI, has achieved a record score of 81.8 on the Extended NYT Connections benchmark, outperforming Kimi K2.5 Thinking. The breakthrough underscores advances in agentic reasoning and sparse architecture design.
- 2GLM-5 Surpasses Kimi K2.5 as Top Open-Weights Model on NYT Connections Benchmark On February 12, 2026, ZAI Labs unveiled GLM-5, a next-generation open-weights large language model that has rapidly ascended to the top of the Extended NYT Connections benchmark with a score of 81.8—surpassing Kimi K2.5 Thinking’s previous record of 78.3.
- 3The achievement, confirmed by independent testing on GitHub by researcher Lech Mazur, marks a pivotal moment in the open-source AI landscape, demonstrating that scalable, efficient architectures can outperform proprietary models in complex reasoning tasks.
psychology_altWhy It Matters
- check_circleThis update has direct impact on the Yapay Zeka Modelleri topic cluster.
- check_circleThis topic remains relevant for short-term AI monitoring.
- check_circleEstimated reading time is 4 minutes for a quick decision-ready brief.
GLM-5 Surpasses Kimi K2.5 as Top Open-Weights Model on NYT Connections Benchmark
On February 12, 2026, ZAI Labs unveiled GLM-5, a next-generation open-weights large language model that has rapidly ascended to the top of the Extended NYT Connections benchmark with a score of 81.8—surpassing Kimi K2.5 Thinking’s previous record of 78.3. The achievement, confirmed by independent testing on GitHub by researcher Lech Mazur, marks a pivotal moment in the open-source AI landscape, demonstrating that scalable, efficient architectures can outperform proprietary models in complex reasoning tasks.
According to a technical report published by ZAI Labs, GLM-5 scales to 744 billion total parameters with 40 billion active parameters during inference, a significant leap from its predecessor GLM-4.5. The model was trained on 28.5 trillion tokens of multilingual, code-inclusive, and reasoning-rich data, enabling unprecedented contextual understanding. Crucially, GLM-5 integrates DeepSeek Sparse Attention (DSA), a novel mechanism that reduces computational overhead by 40% while preserving long-context retention up to 128K tokens—making it uniquely suited for multi-step agentic workflows.
The Extended NYT Connections benchmark, developed by Lech Mazur and hosted on GitHub, evaluates AI models on their ability to identify semantic groupings in the popular New York Times word puzzle. Unlike traditional QA or translation benchmarks, Connections requires abstract reasoning, pattern recognition, and contextual inference across ambiguous categories—a task that closely mirrors human-like cognitive flexibility. GLM-5’s score of 81.8 represents a 4.5% improvement over the prior leader, Kimi K2.5, and is the first time an open-weights model has broken the 80-point threshold on this benchmark.
Industry analysts attribute GLM-5’s success to its agentic engineering framework, a paradigm shift from ‘vibe coding’—where models generate outputs based on statistical likelihood—to structured, goal-oriented reasoning. As described in ZAI’s blog, GLM-5 is designed to decompose complex tasks into sub-goals, self-correct using internal feedback loops, and iteratively refine solutions. This architecture allows it to navigate the nuanced, often misleading categories in Connections, such as distinguishing between ‘Things that are ‘hot’ (spicy, temperature, celebrity)’ or ‘Words that precede ‘-man’ (super, fire, moon)’ with remarkable precision.
Despite its scale, GLM-5 is fully open-sourced under an Apache 2.0 license and is available on Hugging Face and GitHub. Its deployment efficiency, thanks to DSA, enables inference on consumer-grade GPUs, a rarity among models of comparable performance. The model’s release has sparked renewed interest in open-weight alternatives to proprietary systems like GPT-4o and Claude 3.5, particularly among researchers and developers in emerging markets.
While Business Wire’s press release heralds GLM-5 as signaling “a new era in AI: when models become engineers,” critics caution that benchmark scores alone do not guarantee real-world robustness. Nevertheless, the model’s performance on Connections—a test of abstract reasoning rather than memorization—suggests a meaningful step toward generalist AI capabilities. With active development continuing on GitHub and community fine-tuning already underway, GLM-5 may well become the new standard for open-source agentic AI.
For developers interested in experimenting with GLM-5, ZAI provides detailed documentation and a coding plan at z.ai/subscribe, and the model weights are accessible via Hugging Face and GitHub.