Qwen1.5-MoE: Matching 7B Model Performance with 1/3 Activated Parameters

Qwen Blog ·

Qwen releases MoE model with only 2.7B activated parameters matching 7B model performance, demonstrating 3x parameter efficiency through mixture-of-experts architecture.

Categories: Model Releases, OSS & Tools

Excerpt

GITHUB HUGGING FACE MODELSCOPE DEMO DISCORD Introduction Since the surge in interest sparked by Mixtral, research on mixture-of-expert (MoE) models has gained significant momentum. Both researchers and practitioners are keenly interested in understanding how to effectively train such models and assessing their efficiency and effectiveness. Today, we introduce Qwen1.5-MoE-A2.7B, a small MoE model with only 2.7 billion activated parameters yet matching the performance of state-of-the-art 7B models like Mistral 7B and Qwen1.