Qwen2.5 Omni: See, Hear, Talk, Write, Do It All!
Qwen released Qwen2.5-Omni, a flagship end-to-end multimodal model processing text, images, audio, and video with streaming text and speech output, available on Hugging Face.
Excerpt
QWEN CHAT HUGGING FACE MODELSCOPE DASHSCOPE GITHUB PAPER DEMO DISCORD
We release Qwen2.5-Omni, the new flagship end-to-end multimodal model in the Qwen series. Designed for comprehensive multimodal perception, it seamlessly processes diverse inputs including text, images, audio, and video, while delivering real-time streaming responses through both text generation and natural speech synthesis. To try the latest model, feel free to visit Qwen Chat and choose Qwen2.5-Omni-7B. The model is now openly available on Hugging Face, ModelScope, DashScope,and GitHub, with technical documentation available in our Paper.
Read at source: https://qwenlm.github.io/blog/qwen2.5-omni/