8 research outputs found
CTCNet: A CNN-Transformer Cooperation Network for Face Image Super-Resolution
Recently, deep convolution neural networks (CNNs) steered face
super-resolution methods have achieved great progress in restoring degraded
facial details by jointly training with facial priors. However, these methods
have some obvious limitations. On the one hand, multi-task joint learning
requires additional marking on the dataset, and the introduced prior network
will significantly increase the computational cost of the model. On the other
hand, the limited receptive field of CNN will reduce the fidelity and
naturalness of the reconstructed facial images, resulting in suboptimal
reconstructed images. In this work, we propose an efficient CNN-Transformer
Cooperation Network (CTCNet) for face super-resolution tasks, which uses the
multi-scale connected encoder-decoder architecture as the backbone.
Specifically, we first devise a novel Local-Global Feature Cooperation Module
(LGCM), which is composed of a Facial Structure Attention Unit (FSAU) and a
Transformer block, to promote the consistency of local facial detail and global
facial structure restoration simultaneously. Then, we design an efficient Local
Feature Refinement Module (LFRM) to enhance the local facial structure
information. Finally, to further improve the restoration of fine facial
details, we present a Multi-scale Feature Fusion Unit (MFFU) to adaptively fuse
the features from different stages in the encoder procedure. Comprehensive
evaluations on various datasets have assessed that the proposed CTCNet can
outperform other state-of-the-art methods significantly.Comment: 12 pages, 10 figures, 8 table
Towards Robust Blind Face Restoration with Codebook Lookup Transformer
Blind face restoration is a highly ill-posed problem that often requires
auxiliary guidance to 1) improve the mapping from degraded inputs to desired
outputs, or 2) complement high-quality details lost in the inputs. In this
paper, we demonstrate that a learned discrete codebook prior in a small proxy
space largely reduces the uncertainty and ambiguity of restoration mapping by
casting blind face restoration as a code prediction task, while providing rich
visual atoms for generating high-quality faces. Under this paradigm, we propose
a Transformer-based prediction network, named CodeFormer, to model the global
composition and context of the low-quality faces for code prediction, enabling
the discovery of natural faces that closely approximate the target faces even
when the inputs are severely degraded. To enhance the adaptiveness for
different degradation, we also propose a controllable feature transformation
module that allows a flexible trade-off between fidelity and quality. Thanks to
the expressive codebook prior and global modeling, CodeFormer outperforms the
state of the arts in both quality and fidelity, showing superior robustness to
degradation. Extensive experimental results on synthetic and real-world
datasets verify the effectiveness of our method.Comment: Accepted by NeurIPS 2022. Code: https://github.com/sczhou/CodeForme
Dual Associated Encoder for Face Restoration
Restoring facial details from low-quality (LQ) images has remained a
challenging problem due to its ill-posedness induced by various degradations in
the wild. The existing codebook prior mitigates the ill-posedness by leveraging
an autoencoder and learned codebook of high-quality (HQ) features, achieving
remarkable quality. However, existing approaches in this paradigm frequently
depend on a single encoder pre-trained on HQ data for restoring HQ images,
disregarding the domain gap between LQ and HQ images. As a result, the encoding
of LQ inputs may be insufficient, resulting in suboptimal performance. To
tackle this problem, we propose a novel dual-branch framework named DAEFR. Our
method introduces an auxiliary LQ branch that extracts crucial information from
the LQ inputs. Additionally, we incorporate association training to promote
effective synergy between the two branches, enhancing code prediction and
output quality. We evaluate the effectiveness of DAEFR on both synthetic and
real-world datasets, demonstrating its superior performance in restoring facial
details.Comment: Technical Repor
DifFace: Blind Face Restoration with Diffused Error Contraction
While deep learning-based methods for blind face restoration have achieved
unprecedented success, they still suffer from two major limitations. First,
most of them deteriorate when facing complex degradations out of their training
data. Second, these methods require multiple constraints, e.g., fidelity,
perceptual, and adversarial losses, which require laborious hyper-parameter
tuning to stabilize and balance their influences. In this work, we propose a
novel method named DifFace that is capable of coping with unseen and complex
degradations more gracefully without complicated loss designs. The key of our
method is to establish a posterior distribution from the observed low-quality
(LQ) image to its high-quality (HQ) counterpart. In particular, we design a
transition distribution from the LQ image to the intermediate state of a
pre-trained diffusion model and then gradually transmit from this intermediate
state to the HQ target by recursively applying a pre-trained diffusion model.
The transition distribution only relies on a restoration backbone that is
trained with loss on some synthetic data, which favorably avoids the
cumbersome training process in existing methods. Moreover, the transition
distribution can contract the error of the restoration backbone and thus makes
our method more robust to unknown degradations. Comprehensive experiments show
that DifFace is superior to current state-of-the-art methods, especially in
cases with severe degradations. Our code and model are available at
https://github.com/zsyOAOA/DifFace.Comment: 21 page
Face Restoration via Plug-and-Play 3D Facial Priors
State-of-the-art face restoration methods employ deep convolutional neural networks (CNNs) to learn a mapping between degraded and sharp facial patterns by exploring local appearance knowledge. However, most of these methods do not well exploit facial structures and identity information, and only deal with task-specific face restoration (e.g.,face super-resolution or deblurring). In this paper, we propose cross-tasks and cross-models plug-and-play 3D facial priors to explicitly embed the network with the sharp facial structures for general face restoration tasks. Our 3D priors are the first to explore 3D morphable knowledge based on the fusion of parametric descriptions of face attributes (e.g., identity, facial expression, texture, illumination, and face pose). Furthermore, the priors can easily be incorporated into any network and are very efficient in improving the performance and accelerating the convergence speed. Firstly, a 3D face rendering branch is set up to obtain 3D priors of salient facial structures and identity knowledge. Secondly, for better exploiting this hierarchical information (i.e., intensity similarity, 3D facial structure, and identity content), a spatial attention module is designed for image restoration problems. Extensive face restoration experiments including face super-resolution and deblurring demonstrate that the proposed 3D priors achieve superior face restoration results over the state-of-the-art algorithm