This survey reviews text-to-image diffusion models in the context that
diffusion models have emerged to be popular for a wide range of generative
tasks. As a self-contained work, this survey starts with a brief introduction
of how a basic diffusion model works for image synthesis, followed by how
condition or guidance improves learning. Based on that, we present a review of
state-of-the-art methods on text-conditioned image synthesis, i.e.,
text-to-image. We further summarize applications beyond text-to-image
generation: text-guided creative generation and text-guided image editing.
Beyond the progress made so far, we discuss existing challenges and promising
future directions.Comment: First survey on the recent progress of text-to-image generation based
on the diffusion model (under progress