GLM-5 on NVIDIA NIM Enables Free Claude Code Access, Revolutionizing AI-Powered Development
NVIDIA has integrated Z-ai’s GLM-5 large language model into its NIM platform, allowing developers to run Anthropic’s Claude Code CLI for free via a community-built proxy. The move unlocks enterprise-grade agentic coding capabilities without subscription costs, leveraging GLM-5’s 744B MoE architecture and sparse attention.

GLM-5 on NVIDIA NIM Enables Free Claude Code Access, Revolutionizing AI-Powered Development
A quiet revolution is unfolding in the world of AI-assisted software development. NVIDIA has officially added Z-ai’s GLM-5 model to its NVIDIA Inference Microservices (NIM) catalog, enabling developers to harness one of the most advanced open-weight LLMs for free — powering Anthropic’s Claude Code CLI without a paid subscription. This breakthrough, spearheaded by an open-source developer and now supported by enterprise-grade infrastructure, marks a pivotal moment in democratizing high-performance AI coding tools.
According to NVIDIA’s official model card on build.nvidia.com, GLM-5 is a 744B-parameter Mixture-of-Experts (MoE) model with 40B active parameters per inference, trained on 28.5 trillion tokens. It integrates DeepSeek Sparse Attention (DSA), significantly reducing computational overhead while preserving long-context reasoning — a critical feature for complex codebase analysis and multi-turn agentic workflows. Meanwhile, Z-ai’s blog post details GLM-5’s evolution from "vibe coding" to agentic engineering, emphasizing its capacity for long-horizon reasoning, task decomposition, and iterative refinement — capabilities previously exclusive to proprietary models like GPT-4o or Claude 3 Opus.
The innovation is being made accessible through free-claude-code, a lightweight proxy developed by community contributor PreparationAny8816. This tool intercepts Anthropic’s API calls from the Claude Code VSCode extension and redirects them to NVIDIA NIM’s GLM-5 endpoint. Because NVIDIA offers a generous free tier of 40 requests per minute, developers can now perform autonomous code generation, refactoring, and debugging without paying for an Anthropic subscription. The proxy also supports other leading models like Kimi-k2.5 and Minimax-m2.1, making it a versatile gateway to multiple high-performing LLMs.
What sets this integration apart is not just cost savings, but architectural superiority. Unlike OpenCode, which struggles with maintaining state across conversational turns, GLM-5 preserves interleaved thinking tokens — meaning the model can reference its own internal reasoning steps from prior interactions. This enables deeper, more coherent code analysis over multiple iterations, akin to a senior engineer reviewing their own notes mid-sprint. The proxy further enhances efficiency with five proprietary optimizations, including a configurable sliding window rate limiter and session persistence, reducing redundant LLM calls by up to 40% according to early benchmarks shared in the GitHub repository.
Adding to its utility, the tool includes a Telegram bot interface, allowing users to queue coding tasks from mobile devices. This transforms GLM-5 into a true agentic assistant — capable of responding to prompts while commuting, in meetings, or during breaks. The integration of OpenRouter as a fallback provider further ensures resilience and redundancy.
Industry analysts note this development aligns with a broader trend: the erosion of proprietary AI moats. As VentureBeat reports, enterprises are increasingly prioritizing open, interoperable inference stacks over vendor lock-in. NVIDIA’s NIM platform, designed for seamless model deployment across GPUs, is becoming the de facto standard for enterprise-grade LLM orchestration. By opening GLM-5 — a model with performance rivaling GPT-4 — to free public access via NIM, NVIDIA is accelerating adoption while positioning itself as the infrastructure backbone for the next generation of AI developers.
For developers, the implications are profound: a free, high-performance, agentic coding assistant with enterprise-grade reasoning is now within reach. For companies, it signals a shift toward modular, cost-efficient AI tooling. And for the open-source community, it’s a testament to the power of collaborative innovation — turning a side project into a paradigm shift.
GLM-5 is now live on NVIDIA NIM. The GitHub repository for free-claude-code has garnered over 12,000 stars in its first week. As one user put it: "I just automated my entire CI/CD review pipeline — for free. This isn’t a hack. This is the future."
Resources:
- free-claude-code GitHub
- NVIDIA NIM GLM-5 Model Card
- Z.ai: GLM-5 — From Vibe Coding to Agentic Engineering


