Qwen1.5-MoE: Matching 7B Model Performance with 1/3 Activated Parameters
Qwen releases MoE model with only 2.7B activated parameters matching 7B model performance, demonstrating 3x parameter efficiency through mixture-of-experts architecture.
Excerpt
GITHUB HUGGING FACE MODELSCOPE DEMO DISCORD
Introduction Since the surge in interest sparked by Mixtral, research on mixture-of-expert (MoE) models has gained significant momentum. Both researchers and practitioners are keenly interested in understanding how to effectively train such models and assessing their efficiency and effectiveness. Today, we introduce Qwen1.5-MoE-A2.7B, a small MoE model with only 2.7 billion activated parameters yet matching the performance of state-of-the-art 7B models like Mistral 7B and Qwen1.
Read at source: https://qwenlm.github.io/blog/qwen-moe/