Qwen3.6-35B-A3B Beats Claude Opus 4.7 in SVG Generation (2026 Test)
Qwen3.6-35B-A3B generated a more accurate SVG illustration of a pelican riding a bicycle than Claude Opus 4.7, sparking debate over AI image generation capabilities. The test, originally a joke, now reveals surprising gaps in model fidelity.

Qwen3.6-35B-A3B Beats Claude Opus 4.7 in SVG Generation (2026 Test)
summarize3-Point Summary
- 1Qwen3.6-35B-A3B generated a more accurate SVG illustration of a pelican riding a bicycle than Claude Opus 4.7, sparking debate over AI image generation capabilities. The test, originally a joke, now reveals surprising gaps in model fidelity.
- 2Qwen3.6-35B-A3B Beats Claude Opus 4.7 in SVG Generation (2026 Test) In a landmark 2026 test conducted by developer and journalist Simon Willison, Qwen3.6-35B-A3B outperformed Anthropic’s Claude Opus 4.7 in generating precise SVG illustrations using locally run quantized models on a MacBook Pro M5 via LM Studio.
- 3The benchmark — initially a satirical prompt asking models to render a pelican riding a bicycle — evolved into a rigorous test of visual reasoning, prompt adherence, and creative nuance in SVG generation.
psychology_altWhy It Matters
- check_circleThis update has direct impact on the Yapay Zeka Modelleri topic cluster.
- check_circleThis topic remains relevant for short-term AI monitoring.
- check_circleEstimated reading time is 4 minutes for a quick decision-ready brief.
Qwen3.6-35B-A3B Beats Claude Opus 4.7 in SVG Generation (2026 Test)
In a landmark 2026 test conducted by developer and journalist Simon Willison, Qwen3.6-35B-A3B outperformed Anthropic’s Claude Opus 4.7 in generating precise SVG illustrations using locally run quantized models on a MacBook Pro M5 via LM Studio. The benchmark — initially a satirical prompt asking models to render a pelican riding a bicycle — evolved into a rigorous test of visual reasoning, prompt adherence, and creative nuance in SVG generation.
Why the Pelican Benchmark Is Misleading (But Revealing)
While the "pelican riding a bicycle" prompt began as humor, its persistence among AI researchers stems from its ability to expose flaws in geometric understanding, context retention, and stylistic interpretation. Qwen3.6-35B-A3B delivered a structurally accurate bicycle frame with correct wheel alignment, atmospheric clouds, and a comically detailed pelican with a pouch that sagged naturally. Claude Opus 4.7, despite its size and cloud advantages, consistently distorted the frame, misaligned wheels, and produced flat, lifeless compositions — even with thinking_level: max prompts.
How LM Studio Enables Local LLM Testing
Both models were tested using LM Studio with GGUF-quantized versions, allowing execution on consumer-grade hardware without cloud dependency. This setup eliminated API latency and proprietary biases, revealing Qwen3.6-35B-A3B’s true capability: a 20.9GB model outperforming a proprietary, cloud-only model in fine-grained visual tasks. The test underscores how quantization no longer equals reduced quality — especially in creative domains like SVG generation.
SVG Generation vs. PNG in AI Image Tasks
Unlike pixel-based PNG outputs, SVGs demand precise vector logic: coordinate accuracy, path definitions, and scalable elements. Qwen3.6-35B-A3B didn’t just generate a visual — it authored clean, comment-annotated SVG code, including <!-- Sunglasses on flamingo! --> when prompted with "flamingo on a unicycle." Claude Opus 4.7’s output, while technically valid, lacked humor, annotation, or stylistic flair — proving that creativity isn’t just about pixels, but about intent and context.
Local LLMs Are Reshaping Generative AI
Qwen3.6-35B-A3B’s victory signals a paradigm shift: open-weight, locally deployable models are closing the gap with proprietary giants in nuanced generative tasks. For developers, this means high-fidelity SVG generation no longer requires expensive cloud APIs. With tools like LM Studio and Hugging Face’s quantized model hub, creative AI is becoming democratized — and more transparent.
Did Qwen Overfit the Benchmark?
Skeptics suggested Qwen’s team may have trained on Willison’s public prompts. But Willison tested both models with a never-before-shared prompt: "Generate an SVG of a flamingo riding a unicycle with sunglasses and a bowtie." Qwen3.6-35B-A3B delivered a vibrant, personality-rich illustration — complete with annotated comments — while Claude Opus 4.7 produced a sterile, generic vector. This confirms broad creative reasoning, not targeted overfitting.
As AI image generation evolves, the line between joke and benchmark blurs. What began as satire now measures a model’s ability to understand whimsy, structure, and detail — qualities that define true intelligence. Qwen3.6-35B-A3B didn’t just win a test; it redefined what’s possible with local, open-weight models in 2026.


