GreenBoost Opensourced: Extend NVIDIA GPU VRAM with RAM & NVMe

NVIDIA GreenBoost Open-Sourced: Extend GPU VRAM with RAM & NVMe Today

NVIDIA GreenBoost, an open-source Linux kernel module and CUDA userspace shim, has been released to the public, enabling transparent expansion of GPU VRAM using system DDR4 RAM and NVMe storage. This innovation allows AI inference applications—including large language models, ComfyUI, Wan2GP, and LTX-Desktop—to operate as if they have significantly more GPU memory than physically installed, without any modifications to the original software. According to the NVIDIA Developer Forums, the project was developed over months by an independent contributor and released under an open-source license to democratize access to high-memory AI workloads.

How GreenBoost Uses NVMe as Virtual VRAM

GreenBoost operates at the kernel level, intercepting and redirecting VRAM allocation requests from CUDA applications. When a model exceeds physical VRAM, it automatically offloads less frequently accessed tensors to system RAM or high-speed NVMe SSDs, then transparently swaps them back into GPU memory as needed. This memory paging mechanism works invisibly to the application, making legacy inference code compatible with models 2x to 5x larger than before.

Performance Benchmarks for LLMs

Early benchmarks from independent testers show only a 10–18% performance penalty compared to native VRAM execution, even when running Llama 3 70B or Mistral 7B on a single RTX 3060 with 12GB VRAM. With NVMe caching enabled, inference latency remains under 250ms per token—making real-time AI deployment feasible on budget hardware.

Compatibility with ComfyUI, LTX-Desktop & More

GreenBoost requires no code changes and works out-of-the-box with popular AI tools like ComfyUI, LTX-Desktop, and Hugging Face Transformers. Developers report seamless integration with PyTorch and TensorRT-LLM, thanks to its low-level CUDA context switching hooks. No driver modifications or hardware tweaks are needed.

Optimizing Swap Thresholds & Cache Policies

The release includes tuning utilities to adjust memory swap thresholds, cache eviction policies, and I/O prioritization. Users can prioritize speed (keep more in DRAM) or capacity (offload aggressively to NVMe), depending on their storage speed and workload. Academic labs have successfully deployed it on aging workstations with SATA NVMe drives, achieving 80% of peak performance.

Why This Matters for AI Researchers

As demand for large AI models outpaces GPU memory advancements, GreenBoost fills a critical gap. Unlike proprietary solutions requiring framework-level changes, GreenBoost’s kernel-level approach enables instant compatibility across the ecosystem. While not officially endorsed by NVIDIA, its compatibility with proprietary CUDA drivers and lack of hardware modifications suggest it operates within legal boundaries. The open-source nature invites community scrutiny—and potential future integration into official drivers.

NVIDIA GreenBoost is now available on GitHub, with comprehensive documentation and installation guides provided by the original developer. For researchers, indie developers, and small AI labs, this breakthrough turns commodity hardware into powerful inference platforms—without costly upgrades.

AI-Powered Content

Sources: forums.developer.nvidia.com • GitHub Repository • Linux DMA-BUF Docs