107 research outputs found
A Survey of Deep Face Restoration: Denoise, Super-Resolution, Deblur, Artifact Removal
Face Restoration (FR) aims to restore High-Quality (HQ) faces from
Low-Quality (LQ) input images, which is a domain-specific image restoration
problem in the low-level computer vision area. The early face restoration
methods mainly use statistic priors and degradation models, which are difficult
to meet the requirements of real-world applications in practice. In recent
years, face restoration has witnessed great progress after stepping into the
deep learning era. However, there are few works to study deep learning-based
face restoration methods systematically. Thus, this paper comprehensively
surveys recent advances in deep learning techniques for face restoration.
Specifically, we first summarize different problem formulations and analyze the
characteristic of the face image. Second, we discuss the challenges of face
restoration. Concerning these challenges, we present a comprehensive review of
existing FR methods, including prior based methods and deep learning-based
methods. Then, we explore developed techniques in the task of FR covering
network architectures, loss functions, and benchmark datasets. We also conduct
a systematic benchmark evaluation on representative methods. Finally, we
discuss future directions, including network designs, metrics, benchmark
datasets, applications,etc. We also provide an open-source repository for all
the discussed methods, which is available at
https://github.com/TaoWangzj/Awesome-Face-Restoration.Comment: 21 pages, 19 figure
Continuous Facial Motion Deblurring
We introduce a novel framework for continuous facial motion deblurring that
restores the continuous sharp moment latent in a single motion-blurred face
image via a moment control factor. Although a motion-blurred image is the
accumulated signal of continuous sharp moments during the exposure time, most
existing single image deblurring approaches aim to restore a fixed number of
frames using multiple networks and training stages. To address this problem, we
propose a continuous facial motion deblurring network based on GAN (CFMD-GAN),
which is a novel framework for restoring the continuous moment latent in a
single motion-blurred face image with a single network and a single training
stage. To stabilize the network training, we train the generator to restore
continuous moments in the order determined by our facial motion-based
reordering process (FMR) utilizing domain-specific knowledge of the face.
Moreover, we propose an auxiliary regressor that helps our generator produce
more accurate images by estimating continuous sharp moments. Furthermore, we
introduce a control-adaptive (ContAda) block that performs spatially deformable
convolution and channel-wise attention as a function of the control factor.
Extensive experiments on the 300VW datasets demonstrate that the proposed
framework generates a various number of continuous output frames by varying the
moment control factor. Compared with the recent single-to-single image
deblurring networks trained with the same 300VW training set, the proposed
method show the superior performance in restoring the central sharp frame in
terms of perceptual metrics, including LPIPS, FID and Arcface identity
distance. The proposed method outperforms the existing single-to-video
deblurring method for both qualitative and quantitative comparisons
Image Manipulation and Image Synthesis
Image manipulation is of historic importance. Ever since the advent of photography, pictures have been manipulated for various reasons. Historic rulers often used image manipulation techniques for the purpose of self-portrayal or propaganda. In many cases, the goal is to manipulate human behaviour by spreading credible misinformation. Photographs, by their nature, portray the real world and as such are more credible to humans. However, image manipulation may not only serve evil purposes. In this thesis, we propose and analyse methods for image manipulation that serve a positive purpose. Specifically, we treat image manipulation as a tool for solving other tasks. For this, we model image manipulation as an image-to-image translation (I2I) task, i.e., a system that receives an image as input and outputs a manipulated version of the input. We propose multiple I2I based methods. We demonstrate that I2I based image manipulation methods can be used to reduce motion blur in videos. Second, we show that I2I based image manipulation methods can be used for domain adaptation and domain extension. Specifically, we present a method that significantly improves the learning of semantic segmentation from synthetic source data. The same technique can be applied to learning nighttime semantic segmentation from daylight images. Next, we show that I2I can be used to enable weakly supervised object segmentation.
We show that each individual task requires and allows for different levels of supervision during the training of deep models in order to achieve best performance. We discuss the importance of maintaining control over the output of such methods and show that, with reduced levels of supervision, methods for maintaining stability during training and for establishing control over the output of a system become increasingly important. We propose multiple methods that solve the issues that arise in such systems. Finally, we demonstrate that our proposed mechanisms for control can be adapted to synthesise images from scratch
Face Restoration via Plug-and-Play 3D Facial Priors
State-of-the-art face restoration methods employ deep convolutional neural networks (CNNs) to learn a mapping between degraded and sharp facial patterns by exploring local appearance knowledge. However, most of these methods do not well exploit facial structures and identity information, and only deal with task-specific face restoration (e.g.,face super-resolution or deblurring). In this paper, we propose cross-tasks and cross-models plug-and-play 3D facial priors to explicitly embed the network with the sharp facial structures for general face restoration tasks. Our 3D priors are the first to explore 3D morphable knowledge based on the fusion of parametric descriptions of face attributes (e.g., identity, facial expression, texture, illumination, and face pose). Furthermore, the priors can easily be incorporated into any network and are very efficient in improving the performance and accelerating the convergence speed. Firstly, a 3D face rendering branch is set up to obtain 3D priors of salient facial structures and identity knowledge. Secondly, for better exploiting this hierarchical information (i.e., intensity similarity, 3D facial structure, and identity content), a spatial attention module is designed for image restoration problems. Extensive face restoration experiments including face super-resolution and deblurring demonstrate that the proposed 3D priors achieve superior face restoration results over the state-of-the-art algorithm
DEEP LEARNING-BASED APPROACHES FOR IMAGE RESTORATION
Image restoration is the operation of taking a corrupted or degraded low-quality image and estimating a high-quality clean image that is free of degradations. The most common degradations that affect the quality of the image are blur, atmospheric turbulence, adverse weather conditions (like rain, haze, and snow), and noise. Images captured under the influence of these corruptions or degradations can significantly affect the performance of subsequent computer vision algorithms such as segmentation, recognition, object detection, and tracking. With such algorithms becoming vital components in several applications such as autonomous navigation and video surveillance, it is increasingly important to develop sophisticated algorithms to remove these degradations and high-quality clean images. These reasons have motivated a plethora of research on single image restoration methods to remove such effects.
Recently, following the success of deep learning-based convolutional neural networks, many approaches have been proposed to remove the degradations from the corrupted image. We study the following single image restoration problems: (i) atmospheric turbulence removal, (ii) deblurring, (iii) removing distortions introduced by adverse weather conditions such as rain, haze, and snow, and (iv) removing noise. However, existing single image restoration techniques suffer from the following major limitations: (i) They construct global priors without taking into account that these degradations can have a different effect on different local regions of the image. (ii) They use synthetic datasets for training which often results in sub-optimal performance on the real-world images, typically because of the distributional-shift between synthetic and real-world degraded images. (iii) Existing semi-supervised approaches don't account for the effect of unlabeled or real-world degraded image on semi-supervised performance.
To address these limitations, we propose supervised image restoration techniques where we use uncertainty to improve the restoration performance. To overcome the second limitation, we propose a Gaussian process-based pseudo-labeling approach to leverage the real-world rain information and train the deraininng network in a semi-supervised fashion. Furthermore, to address the third limitation we theoretically study the effect of unlabeled images on semi-supervised performance and propose an adaptive rejection technique to boost semi-supervised performance.
Finally, we recognize that existing supervised and semi-supervised methods need some kind of paired labeled data to train the network, and training on any kind of synthetic paired clean-degraded images may not completely solve the domain gap between synthetic and real-world degraded image distributions.
Thus we propose a self-supervised transformer-based approach for image denoising. Here, given a noisy image, we generate multiple down-sampled images and learn the joint relation between these down-sampled using the Gaussian process to denoise the image
Learning to Deblur and Rotate Motion-Blurred Faces
We propose a solution to the novel task of rendering sharp videos from new viewpoints from a single motion-blurred image of a face. Our method1 handles the complexity of face blur by implicitly learning the geometry and motion of faces through the joint training on three large datasets: FFHQ and 300VW, which are publicly available, and a new Bern Multi-View Face Dataset (BMFD) that we built. The first two datasets provide a large variety of faces and allow our model to generalize better. BMFD instead allows us to introduce multi-view constraints, which are crucial to synthesizing sharp videos from a new camera view. It consists of high frame rate synchronized videos from multiple views of several subjects displaying a wide range of facial expressions. We use the high frame rate videos to simulate realistic motion blur through averaging. Thanks to this dataset, we train a neural network to reconstruct a 3D video representation from a single image and the corresponding face gaze. We then provide a camera viewpoint relative to the estimated gaze and the blurry image as input to an encoder-decoder network to generate a video of sharp frames with a novel camera viewpoint. We demonstrate our approach on test subjects of our multi-view dataset and VIDTIMIT
- …