Critical Bug Found in Qwen 3.5 Unsloth GGUF Quantizations, Users Urged to Halt Downloads
A significant bug has been identified in multiple quantized versions of the Qwen 3.5-35B model distributed by Unsloth, prompting community warnings to avoid downloading affected files until a fix is released. The issue, first flagged by researcher Ubergarm and later confirmed by Unsloth, compromises model integrity and output reliability.

Critical Bug Found in Qwen 3.5 Unsloth GGUF Quantizations, Users Urged to Halt Downloads
summarize3-Point Summary
- 1A significant bug has been identified in multiple quantized versions of the Qwen 3.5-35B model distributed by Unsloth, prompting community warnings to avoid downloading affected files until a fix is released. The issue, first flagged by researcher Ubergarm and later confirmed by Unsloth, compromises model integrity and output reliability.
- 2In a rare but critical development within the open-source AI community, users are being urgently advised to avoid downloading quantized versions of the Qwen 3.5-35B model distributed by Unsloth due to confirmed corruption in the GGUF file formats.
- 3The warning, initially posted on the r/LocalLLaMA subreddit by user SunTrainAi, has since gained widespread traction among developers, researchers, and hobbyists deploying large language models locally.
psychology_altWhy It Matters
- check_circleThis update has direct impact on the Yapay Zeka Modelleri topic cluster.
- check_circleThis topic remains relevant for short-term AI monitoring.
- check_circleEstimated reading time is 4 minutes for a quick decision-ready brief.
In a rare but critical development within the open-source AI community, users are being urgently advised to avoid downloading quantized versions of the Qwen 3.5-35B model distributed by Unsloth due to confirmed corruption in the GGUF file formats. The warning, initially posted on the r/LocalLLaMA subreddit by user SunTrainAi, has since gained widespread traction among developers, researchers, and hobbyists deploying large language models locally.
The issue came to light when independent researcher Ubergarm, known for his rigorous testing of quantized LLMs, reported anomalous behavior in the UD_Q4_K_XL variant of the model. Symptoms included inconsistent reasoning, hallucinated outputs, and degraded performance even on simple tasks—issues inconsistent with the model’s original architecture. Upon further investigation, Unsloth, a respected contributor to the GGUF quantization ecosystem, acknowledged in a Hugging Face discussion that "all current quants are messed up," confirming widespread corruption across multiple quantization levels including Q4_K_M, Q5_K_S, and Q4_K_XL variants.
The GGUF format, developed by the llama.cpp team, has become the de facto standard for efficient local deployment of large language models on consumer hardware. Quantization—reducing model precision from 16-bit or 32-bit floating point to 4-bit or 5-bit integers—enables models like Qwen 3.5-35B to run on GPUs with limited VRAM. However, improper quantization can introduce silent errors that degrade performance without triggering obvious failures, making them particularly insidious.
According to the Hugging Face discussion thread, the root cause appears to be a bug in Unsloth’s quantization pipeline, potentially tied to an update in the underlying llama.cpp library or a misconfiguration in the quantization script. The error does not stem from the original Qwen 3.5 model weights, which remain intact on Hugging Face’s official repository, but rather from the post-processing steps applied during quantization. This distinction is critical: users who downloaded the base model directly from Alibaba’s official release are unaffected.
The open-source community has responded with commendable cooperation. Rather than assigning blame, developers have rallied to document the scope of the issue, share diagnostic scripts, and offer workarounds. Several users have uploaded comparison logs showing output discrepancies between the official Qwen 3.5 model and the corrupted Unsloth quants, providing clear evidence of degradation. One user, @AIResearcher2024, noted that the model "consistently fails to follow instructions on arithmetic tasks," while another reported "repeated self-contradictions in multi-step reasoning chains."
As of now, Unsloth has not released a patched version, but has committed to doing so "as soon as possible." The team has requested that users refrain from distributing the flawed files on forums, Discord servers, or model hubs to prevent further propagation. In the interim, users are encouraged to use the original FP16 or BF16 versions of Qwen 3.5-35B from Hugging Face’s official model card, or to wait for the corrected GGUF releases.
This incident underscores the growing complexity of the open-source AI ecosystem. While quantization tools have democratized access to powerful models, they have also introduced new vectors for silent failure. The transparency and rapid response from both Unsloth and the community serve as a model for responsible AI development. As quantization becomes more mainstream, standardized testing protocols and automated validation pipelines may become necessary to prevent similar issues in the future.
For updates, users are advised to monitor the official Hugging Face discussion thread and Unsloth’s GitHub repository. Until a verified fix is released, the community’s collective caution may prevent widespread disruption in local AI deployments worldwide.


