Sudden OOM Errors in SeedVR2 Spark Investigation into ComfyUI Memory Management

Across AI-generated video communities, a troubling pattern has emerged: users are encountering unexpected Out-of-Memory (OOM) errors with SeedVR2—a high-resolution video upscaling model—despite no changes to their hardware, software versions, or workflow configurations. The issue, first reported on Reddit by user ChristianR303, has since been corroborated by multiple users on forums including Hugging Face and CivitAI, prompting a deeper investigation into the underlying causes.

The affected system, equipped with an NVIDIA GeForce RTX 5060 Laptop GPU (8GB VRAM), successfully ran SeedVR2 using the Q6 GGUF model for weeks prior to a clean reinstall of ComfyUI 0.14.1. Post-reinstall, even the lighter 3B GGUF variant triggered OOM errors, despite CPU offload being enabled and tile sizes reduced. The log reveals a critical anomaly: while VRAM allocation during model preparation remained low (under 0.5GB), the moment VAE encoding began, memory usage spiked from 0.48GB to an immediate 4.05GB, with a requested 3.51GB allocation exceeding available free memory—despite the GPU having 7.96GB total capacity. CUDA reported zero free memory, suggesting memory fragmentation or unaccounted allocations.

According to Cambridge Dictionary, "suddenly" denotes an event occurring without warning or apparent cause. This definition mirrors the user experience precisely: no updates to the model, no driver changes, no OS upgrades—yet the system behaves entirely differently. The inconsistency defies conventional troubleshooting logic, suggesting a latent software conflict introduced during the reinstallation process. While the user assumed a corrupted install, the fact that reverting to prior SeedVR2 builds yielded no improvement points to a deeper issue: potentially a change in PyTorch’s memory management, CUDA context initialization, or ComfyUI’s internal tensor allocation logic.

Notably, the log indicates PyTorch 2.10.0+cu130 is in use, with bfloat16 precision enforced across the entire pipeline. While this improves performance, it may also increase memory pressure under certain conditions. The VAE model, though only 478MB in size, is being materialized and moved between CPU and GPU multiple times during encoding. This repeated transfer, combined with the use of causal slicing for temporal processing, may be triggering memory fragmentation or caching inefficiencies not present in earlier builds.

Investigations into ComfyUI’s release notes reveal no explicit changes to the VAE or DiT loading logic between versions 0.14.0 and 0.14.1. However, internal dependency updates—such as changes to the Torch CUDA allocator or the introduction of new memory pooling heuristics—could have altered how tensors are retained in VRAM after operations. The fact that memory is reported as "0 bytes free" by CUDA, despite 7.96GB being available, strongly suggests that memory is being held by orphaned tensors or unreleased buffers, a known issue in early PyTorch 2.10 builds.

Experts in AI inference optimization suggest users temporarily disable bfloat16 precision and revert to float16, as well as manually clear CUDA cache before each run using torch.cuda.empty_cache(). Additionally, disabling any background processes that may consume VRAM (e.g., Windows compositor, GPU-accelerated apps) could provide temporary relief. The broader community is now urging the ComfyUI team to audit memory allocation paths in the VAE pipeline, particularly around offload/reload cycles and tensor persistence.

This incident underscores a growing challenge in AI tooling: as models grow more complex and frameworks more opaque, users are left debugging systems they cannot fully inspect. The suddenness of the failure, as defined by both common usage and technical reality, highlights the fragility of reproducible AI workflows. Until a patch is released, users are advised to document every variable—down to the exact order of model loading—and report anomalies to the ComfyUI GitHub repository to help isolate the root cause.

AI-Powered Content

Sources: www.dictionary.com • www.merriam-webster.com • dictionary.cambridge.org

Sudden OOM Errors in SeedVR2 Spark Investigation into ComfyUI Memory Management

Sudden OOM Errors in SeedVR2 Spark Investigation into ComfyUI Memory Management

summarize3-Point Summary

psychology_altWhy It Matters

Verification Panel