Hierarchical text-conditional image generation with CLIP latents
OpenAI published the unCLIP paper describing the hierarchical latent diffusion architecture behind DALL-E 2, combining CLIP latents with autoregressive/decoding components.
Read at source: https://openai.com/index/hierarchical-text-conditional-image-generation-with-clip-latents