Blind face restoration is a highly ill-posed problem that often requires
auxiliary guidance to 1) improve the mapping from degraded inputs to desired
outputs, or 2) complement high-quality details lost in the inputs. In this
paper, we demonstrate that a learned discrete codebook prior in a small proxy
space largely reduces the uncertainty and ambiguity of restoration mapping by
casting blind face restoration as a code prediction task, while providing rich
visual atoms for generating high-quality faces. Under this paradigm, we propose
a Transformer-based prediction network, named CodeFormer, to model the global
composition and context of the low-quality faces for code prediction, enabling
the discovery of natural faces that closely approximate the target faces even
when the inputs are severely degraded. To enhance the adaptiveness for
different degradation, we also propose a controllable feature transformation
module that allows a flexible trade-off between fidelity and quality. Thanks to
the expressive codebook prior and global modeling, CodeFormer outperforms the
state of the arts in both quality and fidelity, showing superior robustness to
degradation. Extensive experimental results on synthetic and real-world
datasets verify the effectiveness of our method.Comment: Accepted by NeurIPS 2022. Code: https://github.com/sczhou/CodeForme