We present DiffBIR, which leverages pretrained text-to-image diffusion models
for blind image restoration problem. Our framework adopts a two-stage pipeline.
In the first stage, we pretrain a restoration module across diversified
degradations to improve generalization capability in real-world scenarios. The
second stage leverages the generative ability of latent diffusion models, to
achieve realistic image restoration. Specifically, we introduce an injective
modulation sub-network -- LAControlNet for finetuning, while the pre-trained
Stable Diffusion is to maintain its generative ability. Finally, we introduce a
controllable module that allows users to balance quality and fidelity by
introducing the latent image guidance in the denoising process during
inference. Extensive experiments have demonstrated its superiority over
state-of-the-art approaches for both blind image super-resolution and blind
face restoration tasks on synthetic and real-world datasets. The code is
available at https://github.com/XPixelGroup/DiffBIR