Chinese CLIP: Contrastive Vision-Language Pretraining in Chinese
Qwen releases Chinese CLIP, an open-source CLIP variant optimized for Chinese language vision-language tasks including cross-modal retrieval.
Excerpt
CLIP1 is a phenomenal playmaker in vision and multimodal representation learning. It plays not only as a foundation model but also a bridge between vision and language. It has triggered a series of research in different fields, especially text-to-image generation. However, we find that there is a necessity for a language-specific CLIP for applications, especially cross-modal retrieval, and there is no opensourced Chinese CLIP with good performance. We therefore launched this project to promote the Chinese multimodal representation learning.
Read at source: https://qwenlm.github.io/blog/chinese-clip/