Qwen3.6-35B-A3B beats Claude Opus 4.7 in pelican benchmark

Qwen3.6-35B-A3B Beats Claude Opus 4.7 in SVG Generation (2026 Test)

In a landmark 2026 test conducted by developer and journalist Simon Willison, Qwen3.6-35B-A3B outperformed Anthropic’s Claude Opus 4.7 in generating precise SVG illustrations using locally run quantized models on a MacBook Pro M5 via LM Studio. The benchmark — initially a satirical prompt asking models to render a pelican riding a bicycle — evolved into a rigorous test of visual reasoning, prompt adherence, and creative nuance in SVG generation.

Why the Pelican Benchmark Is Misleading (But Revealing)

While the "pelican riding a bicycle" prompt began as humor, its persistence among AI researchers stems from its ability to expose flaws in geometric understanding, context retention, and stylistic interpretation. Qwen3.6-35B-A3B delivered a structurally accurate bicycle frame with correct wheel alignment, atmospheric clouds, and a comically detailed pelican with a pouch that sagged naturally. Claude Opus 4.7, despite its size and cloud advantages, consistently distorted the frame, misaligned wheels, and produced flat, lifeless compositions — even with thinking_level: max prompts.

How LM Studio Enables Local LLM Testing

Both models were tested using LM Studio with GGUF-quantized versions, allowing execution on consumer-grade hardware without cloud dependency. This setup eliminated API latency and proprietary biases, revealing Qwen3.6-35B-A3B’s true capability: a 20.9GB model outperforming a proprietary, cloud-only model in fine-grained visual tasks. The test underscores how quantization no longer equals reduced quality — especially in creative domains like SVG generation.

SVG Generation vs. PNG in AI Image Tasks

Unlike pixel-based PNG outputs, SVGs demand precise vector logic: coordinate accuracy, path definitions, and scalable elements. Qwen3.6-35B-A3B didn’t just generate a visual — it authored clean, comment-annotated SVG code, including  when prompted with "flamingo on a unicycle." Claude Opus 4.7’s output, while technically valid, lacked humor, annotation, or stylistic flair — proving that creativity isn’t just about pixels, but about intent and context.

Local LLMs Are Reshaping Generative AI

Qwen3.6-35B-A3B’s victory signals a paradigm shift: open-weight, locally deployable models are closing the gap with proprietary giants in nuanced generative tasks. For developers, this means high-fidelity SVG generation no longer requires expensive cloud APIs. With tools like LM Studio and Hugging Face’s quantized model hub, creative AI is becoming democratized — and more transparent.

Did Qwen Overfit the Benchmark?

Skeptics suggested Qwen’s team may have trained on Willison’s public prompts. But Willison tested both models with a never-before-shared prompt: "Generate an SVG of a flamingo riding a unicycle with sunglasses and a bowtie." Qwen3.6-35B-A3B delivered a vibrant, personality-rich illustration — complete with annotated comments — while Claude Opus 4.7 produced a sterile, generic vector. This confirms broad creative reasoning, not targeted overfitting.

As AI image generation evolves, the line between joke and benchmark blurs. What began as satire now measures a model’s ability to understand whimsy, structure, and detail — qualities that define true intelligence. Qwen3.6-35B-A3B didn’t just win a test; it redefined what’s possible with local, open-weight models in 2026.

AI-Powered Content

Sources: LM Studio Documentation • Qwen3.6-35B-A3B on Hugging Face • Simon Willison’s Original Post