DeepGEMM 2026 Update: DeepSeek Unveils mHC + Blackwell Support for FP4 LLMs

DeepSeek has launched a groundbreaking update to its open-source DeepGEMM library, introducing Manifold-constrained Hyper-connection (mHC) and early NVIDIA Blackwell (SM100) support—enabling unprecedented efficiency in LLM inference with FP4 precision. Released on GitHub on November 30, 2025, this update is a strategic leap toward affordable, high-performance AI on consumer and edge hardware.

How mHC Revolutionizes Matrix Multiplication

Manifold-constrained Hyper-connection (mHC) is a novel geometric optimization technique that dynamically reshapes tensor connectivity during GEMM operations. Unlike traditional sparse or dense methods, mHC uses manifold learning to preserve high-dimensional data structures, reducing redundant computations by up to 22% in DeepSeek’s internal benchmarks. This directly enhances FP4 quantization stability, allowing ultra-low-precision models to maintain accuracy without costly retraining.

Blackwell (SM100) Integration: Early Access to NVIDIA’s Next-Gen Tensor Cores

DeepSeek’s implementation of NVIDIA’s upcoming Blackwell architecture (SM100) signals a close partnership with hardware developers. While official specs remain under NDA, early SDK access has enabled DeepSeek to optimize DeepGEMM kernels for Blackwell’s new FP4 and INT2 Tensor Cores. This positions DeepSeek as the first open-source project to fully leverage Blackwell’s potential for Mixture of Experts (MoE) models, promising 2x higher throughput per watt compared to Hopper-based systems.

FP4 Precision: The Key to Affordable LLM Inference

FP4 quantization is no longer experimental—it’s operational. DeepGEMM’s native FP4 support reduces memory bandwidth demands by 50% versus FP16, enabling large models like DeepSeek-V3.2 to run on consumer GPUs with minimal latency. Combined with mHC’s efficiency gains, this combination slashes inference costs by an estimated 40%, making real-time AI accessible beyond cloud data centers.

Open-Source Strategy vs. Closed Ecosystems

While OpenAI and others lock hardware-software stacks behind proprietary walls, DeepSeek is building a community-driven ecosystem. By open-sourcing DeepGEMM, linking directly to GitHub, and supporting emerging architectures like SM100, DeepSeek empowers researchers and startups to innovate faster. This mirrors Meta’s PyTorch + Llama strategy—but with stronger cross-platform and vendor-agnostic commitments.

Real-World Impact: From Zhihu Feedback to Production Scaling

Though users on Zhihu have raised concerns about API reliability, the technical depth of this update suggests DeepSeek is addressing root-cause bottlenecks. The synergy between mHC, FP4, and Blackwell support directly targets the latency and memory issues reported by developers. As DeepSeek-V3.2 and DeepSeek-V3.2-Speciale roll out globally, this infrastructure upgrade ensures scalability and sustained performance.

As AI models surpass trillion-parameter thresholds, efficiency isn’t optional—it’s existential. DeepSeek’s DeepGEMM 2026 update isn’t just a code commit; it’s a blueprint for the future of open, hardware-aware AI infrastructure.

AI-Powered Content

Sources: Zhihu: DeepSeek-V3 Enhancements • DeepGEMM GitHub Repo • arXiv: mHC Theory & LLM Efficiency • NVIDIA Blackwell Architecture