Qwen (Alibaba)

From Systems Analysis Wiki
Jump to navigation Jump to search

Qwen (Chinese: 通义千问, Tongyi Qianwen) is a family of large language models (LLMs) developed by Alibaba Group's cloud computing division, Alibaba Cloud[1]. The Qwen models represent a significant contribution from the Chinese technology giant to the field of artificial intelligence. The first version was introduced in beta in April 2023, with a public release in September 2023[1].

The Qwen family has undergone rapid evolution, offering both open-source solutions and more powerful proprietary variants. Key characteristics of Qwen include a wide range of model sizes (from hundreds of millions to hundreds of billions of parameters), advanced multimodal capabilities (processing text, images, audio, and video), support for numerous languages, and innovative architectural solutions such as Mixture of Experts (MoE) and a "thinking" mode for solving complex problems[2].

On the global market, Qwen is positioned as a serious competitor to leading models from OpenAI, Meta, Anthropic, and Mistral AI. Alibaba Cloud's strategy emphasizes both high performance and accessibility, which is reflected in the regular release of open models, predominantly under the Apache 2.0 license[3].

History and development

The development of the Qwen family is characterized by a fast pace and strategic decisions aimed at both the open-source community and commercial users. Starting from an initial architecture similar to LLaMA, Alibaba Cloud has moved towards creating its own unique solutions, including complex MoE architectures and advanced multimodal systems.

Major Qwen Model Releases
Release Date Model Parameters (B) Key Features License
August 2023 Qwen-7B 7 First open-source model; pretrained on ~2.4 trillion tokens; 32k token context window[4]. Tongyi Qianwen License (permission required for commercial use)[5]
September 2023 Qwen-14B 14 Trained on ~3.0 trillion tokens; improved accuracy on complex tasks; 8k context window[6]. Tongyi Qianwen License
November 2023 Qwen-72B 72 Flagship model trained on ~3.0 trillion tokens; 32k context; performance on par with the best models of its time. Tongyi Qianwen License
November 2023 Qwen-1.8B 1.8 Compact model for local deployment; pretrained on ~2.2 trillion tokens; 32k context. Tongyi Qianwen License
June/September 2024 Qwen 2 0.5–72 Second generation; trained on ~7 trillion tokens; introduced MoE models (e.g., 57B-A14B); context extended to 128k with YaRN technology[7]. Apache 2.0 (for most models)
September 2024 Qwen 2.5 3–32 Intermediate update; dataset expanded to ~18 trillion tokens; improved code and math problem-solving skills[8]. Apache 2.0 (except 72B)
November 2024 QwQ-32B (Preview) 32 Experimental "Qwen with Questions" model for complex step-by-step reasoning; 32k context. Apache 2.0 (weights only)
January 2025 Qwen2.5-VL 3–72 Multimodal models (text + image); analysis of images of arbitrary resolution; context up to 128k[9]. Apache 2.0 (except 72B)
March 2025 Qwen2.5-Omni-7B 7 Universal multimodal model: input (text, image, video, audio), output (text, voice). "Thinker-Talker" architecture[10]. Apache 2.0
April 2025 Qwen 3 0.6–235 (MoE) Third generation; trained on ~36 trillion tokens in 119 languages; MoE variants (30B-A3B, 235B-A22B); built-in "think-aloud" mode (<think>); 128k context[11]. Apache 2.0 (all models)

Architecture and technical features

Qwen models are built on a decoder-only transformer architecture, similar to LLaMA and GPT. Each model is an autoregressive decoder with a multi-head attention mechanism and feed-forward blocks.

Key Architectural Components

  • Core Elements: Qwen employs standard solutions for modern LLMs: RMSNorm normalization for training stability and the SwiGLU activation function in fully connected layers to improve performance[4].
  • Positional Encoding: Uses Rotary Positional Embeddings (RoPE) to encode token position information, enabling efficient processing of long sequences[8].
  • Efficient Attention: The FlashAttention algorithm is used to accelerate computations and save memory in the attention mechanism[2].

Dense Models and Mixture of Experts (MoE)

The Qwen family includes models with two types of architectures:

  • Dense Models: All model parameters are active when processing each token. Examples include Qwen-72B and Qwen2.5-32B. These models are simpler to deploy but require more computational resources as their size increases[11].
  • Mixture-of-Experts (MoE) Models: In these models, instead of one large feed-forward layer, several smaller, specialized "experts" are used. For each token, a special gating network layer dynamically selects a small subset of experts for processing. This allows for the creation of models with a huge total number of parameters but significantly lower computational costs during inference.
    • Qwen2-57B-A14B contains 57 billion total parameters but activates only 14 billion for each request[7].
    • Qwen3-235B-A22B contains 235 billion total parameters, of which 22 billion are active[11].

Innovations for Long Context

Support for long context is one of Qwen's strengths.

  • The first models supported up to 32k tokens.
  • In the Qwen 2 generation, the context window was increased to 128k tokens using the YaRN (Yet Another RoPE Extension) method, which allows for context expansion without significant quality loss[7].
  • The experimental model Qwen2.5-Turbo demonstrated operation with a context of up to 1 million tokens[2].

"Thinking Mode" in Qwen 3

The third generation of Qwen implements a "hybrid thinking" mechanism. The model can explicitly form a chain-of-thought before providing the final answer.

  • By default, Qwen 3 embeds a special <think>...</think> block in its output, where it shows its step-by-step logical reasoning.
  • The user can disable this mode by adding the command /no_think to the prompt.

This mechanism improves the model's ability to solve complex problems that require multi-step inference[3].

Multilingual Tokenizer

Qwen uses an extended token vocabulary (about 151,000 tokens), based on the OpenAI GPT-4 BPE vocabulary (cl100k) with additional optimization for Chinese and other languages. This allows for efficient encoding of Chinese characters, Latin script, and programming code, improving the model's multilingual capabilities[4].

Multimodal Capabilities

The Qwen family is actively developing in the direction of multimodality, offering models capable of working with various types of data:

  • Qwen-VL: Combines a vision transformer (for image processing) with a language model, allowing it to answer questions about images and generate descriptions. The Qwen2.5-VL version can analyze images of arbitrary resolution and extract structured data (e.g., from tables and forms)[9].
  • Qwen-Audio: A specialized model for processing audio information, capable of recognizing and generating speech, music, and other sounds[12].
  • Qwen2.5-Omni: A universal end-to-end multimodal model that simultaneously perceives text, images, audio, and video, and generates responses as text or natural speech in a streaming mode. It is based on the "Thinker-Talker" architecture, where the "Thinker" (an LLM) generates text content, and the "Talker" (a two-track autoregressive model) synthesizes audio[10].
  • Specialized models: Models focused on specific tasks have also been released, such as Qwen-Coder (programming) and Qwen-Math (solving mathematical problems).

Training Data and Scale

Qwen models are trained on extremely large data corpora, which include texts from the internet, books, scientific articles, program code, and mathematical data.

  • Qwen 1.0 (7B): ~2.4 trillion tokens.
  • Qwen 1.0 (72B): ~3.0 trillion tokens.
  • Qwen 2.0: ~7 trillion tokens.
  • Qwen 2.5: ~18 trillion tokens.
  • Qwen 3.0: ~36 trillion tokens, covering 119 languages and dialects.

Advanced filtering methods and the generation of high-quality synthetic data are used to improve data quality, especially for domains like mathematics and programming[8].

Licensing and Availability

The licensing policy for Qwen models has evolved over time.

  • Early models (Qwen 1): Were distributed under their own Tongyi Qianwen License. It permitted academic use but required submitting an application and obtaining separate permission for commercial use[5].
  • Later models (Qwen 2, 2.5, 3): Starting with the second generation, the developers moved to a more open policy. Most new models were released under the permissive Apache License 2.0, allowing them to be used freely in both academic and commercial projects[7]. With the release of the Qwen 3 family, all models of this generation became fully open source under Apache 2.0 without additional restrictions[3].
  • Proprietary and restricted models: Despite the general trend towards openness, the largest or strategically important models (e.g., Qwen2.5-Max, Qwen2.5-VL-72B) remain proprietary and are available through paid Alibaba Cloud APIs or are distributed under more restrictive research licenses.

Comparison with Competitors and Performance

Qwen models are actively positioned in a highly competitive market and are regularly compared with developments from leading global companies.

  • vs. Llama (Meta): In technical reports, Qwen often demonstrates superiority over Llama models of similar size. For example, Qwen2-72B shows better results on the MMLU, HumanEval, and GSM8K benchmarks compared to Llama-3-70B.
  • vs. GPT (OpenAI): Flagship Qwen models aim to close the gap with GPT models. Alibaba Cloud claims that Qwen2.5-Max surpasses GPT-4o on some academic benchmarks, and Qwen2-72B-Instruct demonstrates competitiveness with GPT-4-Turbo.
  • vs. Mistral AI: Both companies emphasize open-source models. Tests show that Qwen2-72B outperforms Mixtral-8x22B on key benchmarks[7].

Benchmark Results

Performance comparison of flagship Qwen models against competitors (data as of mid-2024)[7]
Model MMLU (5-shot) HumanEval (0-shot) GSM8K (8-shot) MT-Bench
Qwen2-72B (base) 84.2 64.6 89.5 N/A
Qwen2-72B-Instruct 82.3 86.0 93.2 9.12
Llama-3-70B (base) 79.5 48.2 83.0 N/A
Llama-3-70B-Instruct 82.0 81.7 93.0 8.95
Mixtral-8x22B (base) 77.8 46.3 83.7 N/A
Mixtral-8x22B-Instruct 74.0 73.8 89.1 8.66

Note: N/A — Not applicable or data not available in the cited sources.

Ecosystem and Application

The Qwen family is integrated into various products and platforms, forming a developing ecosystem around it.

  • Alibaba Cloud Platforms: Access to the models, especially the most powerful proprietary versions, is provided through the Model Studio APIs. The PAI-EAS (Platform for AI - Elastic Algorithm Service) platform allows for the deployment, fine-tuning, and customization of Qwen models.
  • Open Source Community: Open versions of the models, their weights, and code are actively hosted on platforms like Hugging Face, ModelScope, and GitHub[6], which promotes their widespread adoption and use by researchers and developers worldwide.
  • Applications: The models are used for a wide range of tasks, from content generation and data analysis to creating AI agents. For example, Qwen3 models support the Model Context Protocol (MCP), which allows them to interact more effectively with other applications and tools.

Literature

Literature

  • Su, J.; et al. (2021). RoFormer: Enhanced Transformer with Rotary Position Embedding. arXiv:2104.09864.
  • Dao, T.; et al. (2022). FlashAttention: Fast and Memory‑Efficient Exact Attention with IO‑Awareness. arXiv:2205.14135.
  • Bai, Jinze; et al. (2023). Qwen Technical Report. arXiv:2309.16609.
  • Peng, B.; et al. (2023). YaRN: Efficient Context Window Extension of Large Language Models. arXiv:2309.00071.
  • Qwen Team (2024). Qwen2 Technical Report. arXiv:2407.10671.
  • Qwen Team (2024). Qwen2‑Audio Technical Report. arXiv:2407.10759.
  • Qwen Team (2025). Qwen2.5 Technical Report. arXiv:2412.15115.
  • Bai, Jinze; et al. (2025). Qwen2.5‑VL: A Versatile Vision‑Language Model for Real‑World Agent Tasks. arXiv:2502.13923.
  • Wang, Wen; et al. (2025). Qwen2.5‑Omni: A Streaming End‑to‑End Multimodal Model. arXiv:2503.20215.
  • Yang, An; et al. (2025). Qwen3 Technical Report. arXiv:2505.09388.

Notes

  1. 1.0 1.1 "Qwen". In Wikipedia [1]
  2. 2.0 2.1 2.2 "Qwen Models: Alibaba's Next-Generation AI Family for Text, Vision, and Beyond". Inferless. [2]
  3. 3.0 3.1 3.2 "Qwen 3 offers a case study in how to effectively release a model". Simon Willison's Weblog. [3]
  4. 4.0 4.1 4.2 Bai, Jinze; et al. (2023). Qwen Technical Report. arXiv:2309.16609.
  5. 5.0 5.1 "Qwen/Qwen-7B". Hugging Face. [4]
  6. 6.0 6.1 "GitHub - QwenLM/Qwen: The official repo of Qwen". GitHub. [5]
  7. 7.0 7.1 7.2 7.3 7.4 7.5 Qwen Team (2024). Qwen2 Technical Report. arXiv:2407.10671.
  8. 8.0 8.1 8.2 Qwen Team (2025). Qwen2.5 Technical Report. arXiv:2412.15115.
  9. 9.0 9.1 Bai, Jinze; et al. (2025). Qwen2.5-VL: A Versatile Vision-Language Model for Real-World Agent Tasks. arXiv:2502.13923.
  10. 10.0 10.1 Wang, Wen; et al. (2025). Qwen2.5-Omni: A Streaming End-to-End Multimodal Model. arXiv:2503.20215.
  11. 11.0 11.1 11.2 Yang, An; et al. (2025). Qwen3 Technical Report. arXiv:2505.09388.
  12. Gao, Shidong; et al. (2024). Qwen2-Audio Technical Report. arXiv:2407.10759.