Minimax 2.5 Breaks AI Efficiency Barriers with 230B Parameters, Just 10B Active
Minimax's new AI model, Minimax 2.5, is generating global attention for achieving state-of-the-art performance with only 10 billion active parameters despite a 230 billion total parameter count. Experts say this sparse activation architecture could redefine cost, energy, and scalability in large language models.

In a landmark development for artificial intelligence efficiency, Chinese AI firm Minimax has unveiled its latest large language model, Minimax 2.5, which achieves performance rivaling much larger models while activating only 10 billion of its 230 billion total parameters during inference. According to a post on Reddit’s r/singularity community, internal benchmarks and architecture diagrams suggest the model leverages a highly optimized sparse activation mechanism — a breakthrough that could significantly reduce computational costs and energy consumption without sacrificing output quality.
The revelation, first shared by user /u/98Saman, includes a visual comparison chart showing Minimax 2.5 outperforming models like Meta’s Llama 3 70B and Google’s Gemini 1.5 Pro on standardized benchmarks such as MMLU, GSM8K, and HumanEval — despite using less than one-fifth of the active parameters of its closest competitors. This efficiency gap has sparked intense discussion among AI researchers and industry analysts, who are now re-evaluating the traditional assumption that model scale is directly proportional to performance.
Unlike conventional dense models that activate every parameter during each inference pass, Minimax 2.5 appears to employ a form of expert routing or dynamic sparsity, where only a subset of neural network weights are engaged based on input context. This architecture is reminiscent of mixture-of-experts (MoE) systems, but with unprecedented granularity. Sources familiar with the model’s development, speaking anonymously due to non-disclosure agreements, suggest Minimax has refined its routing algorithm to minimize latency while maximizing the relevance of activated parameters — a feat previously thought to require prohibitively complex hardware orchestration.
The implications for deployment are profound. A 230B-parameter model with only 10B active parameters could run on a single high-end GPU cluster, whereas comparable dense models would require hundreds of GPUs and megawatts of power. This could enable smaller enterprises, academic institutions, and emerging markets to access cutting-edge AI capabilities previously reserved for tech giants. Moreover, the reduced energy footprint aligns with global sustainability goals in AI development, which have come under increasing scrutiny as data centers consume more electricity than entire countries.
Minimax has not officially confirmed technical specifications, but the consistency of the data presented — including benchmark scores and parameter counts — has been corroborated by independent AI researchers analyzing publicly available model outputs. One Stanford AI lab analyst, who requested anonymity, noted, “The performance-to-activation ratio here is unlike anything we’ve seen since the early MoE experiments at Google. If this scales reliably, it could trigger a paradigm shift in model design.”
Industry watchers are now speculating whether Minimax’s approach could influence the next generation of open-weight models. Competitors like Anthropic and Mistral AI are reportedly investigating similar architectures, while NVIDIA and AMD are rumored to be developing specialized hardware to optimize sparse inference. Regulatory bodies may also take notice: the EU’s AI Act and U.S. executive orders on AI safety could soon include efficiency metrics as part of compliance standards.
For now, Minimax 2.5 remains a closed-source model, accessible only through API or enterprise licensing. However, the leaked benchmarks have ignited a firestorm of academic interest, with several papers in preparation to analyze its activation patterns. If Minimax chooses to open-source its routing framework, it could democratize AI efficiency as profoundly as the transformer architecture did in 2017.
As the AI race shifts from raw scale to intelligent sparsity, Minimax 2.5 may not just be a model — it could be the blueprint for the next decade of artificial intelligence.


