Latent Diffusion models (LDMs) have achieved remarkable results in
synthesizing high-resolution images. However, the iterative sampling process is
computationally intensive and leads to slow generation. Inspired by Consistency
Models (song et al.), we propose Latent Consistency Models (LCMs), enabling
swift inference with minimal steps on any pre-trained LDMs, including Stable
Diffusion (rombach et al). Viewing the guided reverse diffusion process as
solving an augmented probability flow ODE (PF-ODE), LCMs are designed to
directly predict the solution of such ODE in latent space, mitigating the need
for numerous iterations and allowing rapid, high-fidelity sampling. Efficiently
distilled from pre-trained classifier-free guided diffusion models, a
high-quality 768 x 768 2~4-step LCM takes only 32 A100 GPU hours for training.
Furthermore, we introduce Latent Consistency Fine-tuning (LCF), a novel method
that is tailored for fine-tuning LCMs on customized image datasets. Evaluation
on the LAION-5B-Aesthetics dataset demonstrates that LCMs achieve
state-of-the-art text-to-image generation performance with few-step inference.
Project Page: https://latent-consistency-models.github.io