Qwen2-VL: To See the World More Clearly

Qwen Blog · Aug 28, 2024

Qwen2-VL released with SOTA visual understanding across benchmarks and 20+ minute video comprehension, expanding the Qwen2 family into multimodal frontier.

Categories: Model Releases, Research

Excerpt

DEMO GITHUB HUGGING FACE MODELSCOPE API DISCORD After a year’s relentless efforts, today we are thrilled to release Qwen2-VL! Qwen2-VL is the latest version of the vision language models based on Qwen2 in the Qwen model familities. Compared with Qwen-VL, Qwen2-VL has the capabilities of: SoTA understanding of images of various resolution & ratio: Qwen2-VL achieves state-of-the-art performance on visual understanding benchmarks, including MathVista, DocVQA, RealWorldQA, MTVQA, etc. Understanding videos of 20min+: Qwen2-VL can understand videos over 20 minutes for high-quality video-based question answering, dialog, content creation, etc.

Read at source: https://qwenlm.github.io/blog/qwen2-vl/