Qwen2.5 Omni: See, Hear, Talk, Write, Do It All!

Qwen Blog · Mar 26, 2025

Qwen released Qwen2.5-Omni, a flagship end-to-end multimodal model processing text, images, audio, and video with streaming text and speech output, available on Hugging Face.

Categories: Model Releases, OSS & Tools

Excerpt

QWEN CHAT HUGGING FACE MODELSCOPE DASHSCOPE GITHUB PAPER DEMO DISCORD We release Qwen2.5-Omni, the new flagship end-to-end multimodal model in the Qwen series. Designed for comprehensive multimodal perception, it seamlessly processes diverse inputs including text, images, audio, and video, while delivering real-time streaming responses through both text generation and natural speech synthesis. To try the latest model, feel free to visit Qwen Chat and choose Qwen2.5-Omni-7B. The model is now openly available on Hugging Face, ModelScope, DashScope,and GitHub, with technical documentation available in our Paper.

Read at source: https://qwenlm.github.io/blog/qwen2.5-omni/