1,257 research outputs found
Intelligent visual media processing: when graphics meets vision
The computer graphics and computer vision communities have been working closely together in recent
years, and a variety of algorithms and applications have been developed to analyze and manipulate the visual media
around us. There are three major driving forces behind this phenomenon: i) the availability of big data from the
Internet has created a demand for dealing with the ever increasing, vast amount of resources; ii) powerful processing
tools, such as deep neural networks, provide e�ective ways for learning how to deal with heterogeneous visual data;
iii) new data capture devices, such as the Kinect, bridge between algorithms for 2D image understanding and
3D model analysis. These driving forces have emerged only recently, and we believe that the computer graphics
and computer vision communities are still in the beginning of their honeymoon phase. In this work we survey
recent research on how computer vision techniques bene�t computer graphics techniques and vice versa, and cover
research on analysis, manipulation, synthesis, and interaction. We also discuss existing problems and suggest
possible further research directions
DiffBIR: Towards Blind Image Restoration with Generative Diffusion Prior
We present DiffBIR, which leverages pretrained text-to-image diffusion models
for blind image restoration problem. Our framework adopts a two-stage pipeline.
In the first stage, we pretrain a restoration module across diversified
degradations to improve generalization capability in real-world scenarios. The
second stage leverages the generative ability of latent diffusion models, to
achieve realistic image restoration. Specifically, we introduce an injective
modulation sub-network -- LAControlNet for finetuning, while the pre-trained
Stable Diffusion is to maintain its generative ability. Finally, we introduce a
controllable module that allows users to balance quality and fidelity by
introducing the latent image guidance in the denoising process during
inference. Extensive experiments have demonstrated its superiority over
state-of-the-art approaches for both blind image super-resolution and blind
face restoration tasks on synthetic and real-world datasets. The code is
available at https://github.com/XPixelGroup/DiffBIR
A Survey of Deep Face Restoration: Denoise, Super-Resolution, Deblur, Artifact Removal
Face Restoration (FR) aims to restore High-Quality (HQ) faces from
Low-Quality (LQ) input images, which is a domain-specific image restoration
problem in the low-level computer vision area. The early face restoration
methods mainly use statistic priors and degradation models, which are difficult
to meet the requirements of real-world applications in practice. In recent
years, face restoration has witnessed great progress after stepping into the
deep learning era. However, there are few works to study deep learning-based
face restoration methods systematically. Thus, this paper comprehensively
surveys recent advances in deep learning techniques for face restoration.
Specifically, we first summarize different problem formulations and analyze the
characteristic of the face image. Second, we discuss the challenges of face
restoration. Concerning these challenges, we present a comprehensive review of
existing FR methods, including prior based methods and deep learning-based
methods. Then, we explore developed techniques in the task of FR covering
network architectures, loss functions, and benchmark datasets. We also conduct
a systematic benchmark evaluation on representative methods. Finally, we
discuss future directions, including network designs, metrics, benchmark
datasets, applications,etc. We also provide an open-source repository for all
the discussed methods, which is available at
https://github.com/TaoWangzj/Awesome-Face-Restoration.Comment: 21 pages, 19 figure
Reinforced Disentanglement for Face Swapping without Skip Connection
The SOTA face swap models still suffer the problem of either target identity
(i.e., shape) being leaked or the target non-identity attributes (i.e.,
background, hair) failing to be fully preserved in the final results. We show
that this insufficient disentanglement is caused by two flawed designs that
were commonly adopted in prior models: (1) counting on only one compressed
encoder to represent both the semantic-level non-identity facial
attributes(i.e., pose) and the pixel-level non-facial region details, which is
contradictory to satisfy at the same time; (2) highly relying on long
skip-connections between the encoder and the final generator, leaking a certain
amount of target face identity into the result. To fix them, we introduce a new
face swap framework called 'WSC-swap' that gets rid of skip connections and
uses two target encoders to respectively capture the pixel-level non-facial
region attributes and the semantic non-identity attributes in the face region.
To further reinforce the disentanglement learning for the target encoder, we
employ both identity removal loss via adversarial training (i.e., GAN) and the
non-identity preservation loss via prior 3DMM models like [11]. Extensive
experiments on both FaceForensics++ and CelebA-HQ show that our results
significantly outperform previous works on a rich set of metrics, including one
novel metric for measuring identity consistency that was completely neglected
before.Comment: Accepted by ICCV 202
RestoreFormer++: Towards Real-World Blind Face Restoration from Undegraded Key-Value Pairs
Blind face restoration aims at recovering high-quality face images from those
with unknown degradations. Current algorithms mainly introduce priors to
complement high-quality details and achieve impressive progress. However, most
of these algorithms ignore abundant contextual information in the face and its
interplay with the priors, leading to sub-optimal performance. Moreover, they
pay less attention to the gap between the synthetic and real-world scenarios,
limiting the robustness and generalization to real-world applications. In this
work, we propose RestoreFormer++, which on the one hand introduces
fully-spatial attention mechanisms to model the contextual information and the
interplay with the priors, and on the other hand, explores an extending
degrading model to help generate more realistic degraded face images to
alleviate the synthetic-to-real-world gap. Compared with current algorithms,
RestoreFormer++ has several crucial benefits. First, instead of using a
multi-head self-attention mechanism like the traditional visual transformer, we
introduce multi-head cross-attention over multi-scale features to fully explore
spatial interactions between corrupted information and high-quality priors. In
this way, it can facilitate RestoreFormer++ to restore face images with higher
realness and fidelity. Second, in contrast to the recognition-oriented
dictionary, we learn a reconstruction-oriented dictionary as priors, which
contains more diverse high-quality facial details and better accords with the
restoration target. Third, we introduce an extending degrading model that
contains more realistic degraded scenarios for training data synthesizing, and
thus helps to enhance the robustness and generalization of our RestoreFormer++
model. Extensive experiments show that RestoreFormer++ outperforms
state-of-the-art algorithms on both synthetic and real-world datasets.Comment: Submitted to TPAMI. An extension of RestoreForme
- …