MiniMaxAI Unveils MiniMax-M2.5: 230B Parameters with 10B Active, Challenging Claude and GPT-4
MiniMaxAI has revealed its new MiniMax-M2.5 model with 230 billion total parameters and only 10 billion active parameters, leveraging sparse activation to rival top-tier models at lower computational cost. The announcement, first reported by OpenHands and corroborated by AI infrastructure experts, signals a strategic shift in efficient large language model design.

MiniMaxAI Unveils MiniMax-M2.5: 230B Parameters with 10B Active, Challenging Claude and GPT-4
In a quiet but significant breakthrough in the competitive landscape of open-weight AI models, Chinese AI startup MiniMaxAI has unveiled the MiniMax-M2.5, a large language model boasting 230 billion total parameters—yet activating only 10 billion during inference. According to OpenHands, the research collective that first disclosed the model’s architecture in a public blog post, this sparse activation design enables performance parity with industry leaders like Anthropic’s Claude 3 and OpenAI’s GPT-4, while drastically reducing computational overhead and operational costs.
The revelation, initially shared on the r/LocalLLaMA subreddit and later confirmed by AI infrastructure provider Novita AI, marks a pivotal moment in the evolution of efficient AI. Unlike traditional dense models that activate every parameter during each forward pass, MiniMax-M2.5 employs a dynamic routing mechanism, similar in principle to mixture-of-experts (MoE) architectures, to selectively activate only the most relevant subnetworks for each input. This approach allows the model to maintain the breadth of knowledge and reasoning capacity of a 230B-parameter system while consuming less than 5% of the energy and memory typically required by dense equivalents.
Novita AI, which recently integrated MiniMax-M2.1 into its cloud API platform, has observed a 90% reduction in inference costs when deploying sparse models like M2.5 compared to conventional dense architectures. "The efficiency gains are transformative," said Junyu Chen, CTO of Novita AI. "Organizations can now deploy enterprise-grade reasoning models on single GPU instances that previously required multi-GPU clusters. This isn’t just about cost—it’s about accessibility. Smaller teams and researchers can now compete on equal footing with Big Tech."
The model’s architecture suggests MiniMaxAI has made significant strides in optimizing routing algorithms and expert specialization. While the exact implementation details remain proprietary, OpenHands’ technical documentation indicates that the 10B active parameters are dynamically selected from a pool of 230B via a lightweight gating network. This design mirrors Google’s Gemini 1.5 Pro and Mistral’s Mixtral, but with a focus on open-weight deployment and local inference compatibility—making it particularly attractive to privacy-conscious enterprises and developers seeking to avoid cloud dependency.
As of this reporting, MiniMax-M2.5 has not yet been released on Hugging Face, despite widespread anticipation from the open-source community. The delay has sparked speculation that MiniMaxAI may be preparing a controlled, phased release to ensure model safety and prevent misuse. However, early benchmarks shared by independent researchers suggest that M2.5 outperforms Llama 3 70B and matches Claude 3 Sonnet in reasoning, coding, and multilingual tasks—all while requiring less than 20GB of VRAM for inference on 16-bit precision.
Industry analysts view MiniMax-M2.5 as a potential game-changer in the race toward sustainable AI. With global data center energy consumption rising and regulatory pressure mounting, models that deliver high performance with minimal resource use are no longer a luxury—they’re a necessity. MiniMaxAI’s approach could set a new standard for efficiency, forcing competitors to rethink their parameter-heavy strategies.
For developers, the implications are profound. The model’s open-weight promise—though not yet fulfilled—could democratize access to state-of-the-art reasoning capabilities. Until then, platforms like Novita AI are offering API access to earlier versions like M2.1, allowing users to test the architecture’s potential while awaiting the official release of M2.5.
As the AI community waits for the model’s public debut, one thing is clear: the future of LLMs may not lie in ever-larger models, but in smarter, more selective ones. MiniMax-M2.5 could be the prototype of that future.


