Qwen2.5-Max: Exploring the Intelligence of Large-scale MoE Model

Qwen Blog · Jan 28, 2025

Alibaba released Qwen2.5-Max, a large-scale Mixture-of-Experts model trained with deep scaling insights similar to DeepSeek V3.

Categories: Model Releases

Excerpt

QWEN CHAT API DEMO DISCORD It is widely recognized that continuously scaling both data size and model size can lead to significant improvements in model intelligence. However, the research and industry community has limited experience in effectively scaling extremely large models, whether they are dense or Mixture-of-Expert (MoE) models. Many critical details regarding this scaling process were only disclosed with the recent release of DeepSeek V3. Concurrently, we are developing Qwen2.

Read at source: https://qwenlm.github.io/blog/qwen2.5-max/