Qwen Unveils Qwen 3.5 397B and Plus Models, Setting New Benchmarks in Multimodal AI
Alibaba's Qwen team has launched the Qwen 3.5 series, including the 397B-parameter model and Qwen 3.5 Plus, claiming state-of-the-art performance in vision-language tasks and agent-based reasoning. The models integrate linear attention with sparse MoE architectures to enhance efficiency and scalability.

Qwen Unveils Qwen 3.5 397B and Plus Models, Setting New Benchmarks in Multimodal AI
Alibaba’s Tongyi Lab has officially released the Qwen 3.5 series, introducing two groundbreaking large language models: Qwen 3.5 397B and Qwen 3.5 Plus. According to a post on Reddit’s r/LocalLLaMA community, these models represent a significant leap forward in both text-only and multimodal AI capabilities, rivaling the performance of frontier models from OpenAI, Google, and Anthropic. Built on a hybrid architecture combining linear attention mechanisms with sparse Mixture-of-Experts (MoE), the Qwen 3.5 series achieves unprecedented inference efficiency while maintaining exceptional accuracy across diverse tasks.
The Qwen 3.5 397B-A17B variant is explicitly designed as a native vision-language model, capable of understanding and generating responses based on images, videos, and graphical user interfaces (GUIs). This capability is further validated by its performance in agentic tasks—where the model acts autonomously to complete multi-step objectives such as navigating software interfaces or generating code from visual prompts. Meanwhile, the Qwen 3.5 Plus model, accessible via Qwen Chat, offers an optimized version tailored for broad deployment, with enhanced text comprehension, logical reasoning, and code generation abilities. Users can interact with the model directly through a web interface or mobile app, which features image generation and real-time multimodal dialogue.
Technical details, as outlined in a peer-reviewed paper submitted to ICLR 2024 on OpenReview, reveal that Qwen-VL, the foundational vision-language model for this series, excels in localization, text reading within images, and contextual understanding of complex visual scenes. Researchers from Alibaba’s Tongyi Lab, including Jinze Bai and Shuai Bai, demonstrated that Qwen-VL outperforms prior models on benchmarks such as MMBench, MMMU, and TextVQA, achieving top-tier scores without requiring task-specific fine-tuning. The integration of linear attention reduces computational overhead during long-context processing, while sparse MoE enables dynamic routing of tokens to specialized expert networks—resulting in faster inference and lower memory consumption compared to dense architectures of similar scale.
Industry analysts note that the release signals Alibaba’s aggressive push to compete in the global AI race. Unlike many Western models that remain proprietary or restricted, Qwen 3.5 models are being made increasingly accessible through open APIs and web platforms, aligning with China’s broader strategy to democratize AI innovation. The availability of Qwen 3.5 Plus on mobile apps—evidenced by screenshots of iOS and Android interfaces on the Qwen Chat site—suggests a deliberate focus on consumer adoption and real-world utility.
Furthermore, the models’ robustness in code generation and agent-based workflows marks a shift toward AI systems that can function as digital assistants capable of executing complex, multi-modal tasks. Early testers report successful interactions where the model analyzed screenshots of software bugs, generated debugging scripts, and even simulated user clicks to resolve interface issues—an advancement previously seen only in specialized research prototypes.
While concerns around data privacy and model transparency persist, as noted in Wikipedia’s overview of Qwen’s broader ecosystem, Alibaba has maintained a relatively open approach to documentation and evaluation. The company has published detailed technical reports and benchmark comparisons, inviting independent verification of performance claims. With Qwen 3.5, Alibaba is not just matching global competitors—it is redefining what scalable, efficient multimodal AI can achieve.
As enterprises and developers begin integrating these models into applications ranging from automated customer service to educational tools and accessibility aids, the Qwen 3.5 series may well become the new standard for open, high-performance AI systems in the coming year.


