88 research outputs found
CVAE-GAN: Fine-Grained Image Generation through Asymmetric Training
We present variational generative adversarial networks, a general learning
framework that combines a variational auto-encoder with a generative
adversarial network, for synthesizing images in fine-grained categories, such
as faces of a specific person or objects in a category. Our approach models an
image as a composition of label and latent attributes in a probabilistic model.
By varying the fine-grained category label fed into the resulting generative
model, we can generate images in a specific category with randomly drawn values
on a latent attribute vector. Our approach has two novel aspects. First, we
adopt a cross entropy loss for the discriminative and classifier network, but a
mean discrepancy objective for the generative network. This kind of asymmetric
loss function makes the GAN training more stable. Second, we adopt an encoder
network to learn the relationship between the latent space and the real image
space, and use pairwise feature matching to keep the structure of generated
images. We experiment with natural images of faces, flowers, and birds, and
demonstrate that the proposed models are capable of generating realistic and
diverse samples with fine-grained category labels. We further show that our
models can be applied to other tasks, such as image inpainting,
super-resolution, and data augmentation for training better face recognition
models.Comment: to appear in ICCV 201
Mask-Guided Portrait Editing with Conditional GANs
Portrait editing is a popular subject in photo manipulation. The Generative
Adversarial Network (GAN) advances the generating of realistic faces and allows
more face editing. In this paper, we argue about three issues in existing
techniques: diversity, quality, and controllability for portrait synthesis and
editing. To address these issues, we propose a novel end-to-end learning
framework that leverages conditional GANs guided by provided face masks for
generating faces. The framework learns feature embeddings for every face
component (e.g., mouth, hair, eye), separately, contributing to better
correspondences for image translation, and local face editing. With the mask,
our network is available to many applications, like face synthesis driven by
mask, face Swap+ (including hair in swapping), and local manipulation. It can
also boost the performance of face parsing a bit as an option of data
augmentation.Comment: To appear in CVPR201
Learnable Sampling 3D Convolution for Video Enhancement and Action Recognition
A key challenge in video enhancement and action recognition is to fuse useful
information from neighboring frames. Recent works suggest establishing accurate
correspondences between neighboring frames before fusing temporal information.
However, the generated results heavily depend on the quality of correspondence
estimation. In this paper, we propose a more robust solution: \emph{sampling
and fusing multi-level features} across neighborhood frames to generate the
results. Based on this idea, we introduce a new module to improve the
capability of 3D convolution, namely, learnable sampling 3D convolution
(\emph{LS3D-Conv}). We add learnable 2D offsets to 3D convolution which aims to
sample locations on spatial feature maps across frames. The offsets can be
learned for specific tasks. The \emph{LS3D-Conv} can flexibly replace 3D
convolution layers in existing 3D networks and get new architectures, which
learns the sampling at multiple feature levels. The experiments on video
interpolation, video super-resolution, video denoising, and action recognition
demonstrate the effectiveness of our approach
GIQA: Generated Image Quality Assessment
Generative adversarial networks (GANs) have achieved impressive results
today, but not all generated images are perfect. A number of quantitative
criteria have recently emerged for generative model, but none of them are
designed for a single generated image. In this paper, we propose a new research
topic, Generated Image Quality Assessment (GIQA), which quantitatively
evaluates the quality of each generated image. We introduce three GIQA
algorithms from two perspectives: learning-based and data-based. We evaluate a
number of images generated by various recent GAN models on different datasets
and demonstrate that they are consistent with human assessments. Furthermore,
GIQA is available to many applications, like separately evaluating the realism
and diversity of generative models, and enabling online hard negative mining
(OHEM) in the training of GANs to improve the results.Comment: ECCV202
PriorGAN: Real Data Prior for Generative Adversarial Nets
Generative adversarial networks (GANs) have achieved rapid progress in
learning rich data distributions. However, we argue about two main issues in
existing techniques. First, the low quality problem where the learned
distribution has massive low quality samples. Second, the missing modes problem
where the learned distribution misses some certain regions of the real data
distribution. To address these two issues, we propose a novel prior that
captures the whole real data distribution for GANs, which are called PriorGANs.
To be specific, we adopt a simple yet elegant Gaussian Mixture Model (GMM) to
build an explicit probability distribution on the feature level for the whole
real data. By maximizing the probability of generated data, we can push the low
quality samples to high quality. Meanwhile, equipped with the prior, we can
estimate the missing modes in the learned distribution and design a sampling
strategy on the real data to solve the problem. The proposed real data prior
can generalize to various training settings of GANs, such as LSGAN, WGAN-GP,
SNGAN, and even the StyleGAN. Our experiments demonstrate that PriorGANs
outperform the state-of-the-art on the CIFAR-10, FFHQ, LSUN-cat, and LSUN-bird
datasets by large margins
FaceShifter: Towards High Fidelity And Occlusion Aware Face Swapping
In this work, we propose a novel two-stage framework, called FaceShifter, for
high fidelity and occlusion aware face swapping. Unlike many existing face
swapping works that leverage only limited information from the target image
when synthesizing the swapped face, our framework, in its first stage,
generates the swapped face in high-fidelity by exploiting and integrating the
target attributes thoroughly and adaptively. We propose a novel attributes
encoder for extracting multi-level target face attributes, and a new generator
with carefully designed Adaptive Attentional Denormalization (AAD) layers to
adaptively integrate the identity and the attributes for face synthesis. To
address the challenging facial occlusions, we append a second stage consisting
of a novel Heuristic Error Acknowledging Refinement Network (HEAR-Net). It is
trained to recover anomaly regions in a self-supervised way without any manual
annotations. Extensive experiments on wild faces demonstrate that our face
swapping results are not only considerably more perceptually appealing, but
also better identity preserving in comparison to other state-of-the-art
methods.Comment: Accepted to CVPR 2020 (Oral), generated dataset and project webpage:
lingzhili.com/FaceShifterPage
Semi-Supervised Image-to-Image Translation using Latent Space Mapping
Recent image-to-image translation works have been transferred from supervised
to unsupervised settings due to the expensive cost of capturing or labeling
large amounts of paired data. However, current unsupervised methods using the
cycle-consistency constraint may not find the desired mapping, especially for
difficult translation tasks. On the other hand, a small number of paired data
are usually accessible. We therefore introduce a general framework for
semi-supervised image translation. Unlike previous works, our main idea is to
learn the translation over the latent feature space instead of the image space.
Thanks to the low dimensional feature space, it is easier to find the desired
mapping function, resulting in improved quality of translation results as well
as the stability of the translation model. Empirically we show that using
feature translation generates better results, even using a few bits of paired
data. Experimental comparisons with state-of-the-art approaches demonstrate the
effectiveness of the proposed framework on a variety of challenging
image-to-image translation task
Uformer: A General U-Shaped Transformer for Image Restoration
In this paper, we present Uformer, an effective and efficient
Transformer-based architecture for image restoration, in which we build a
hierarchical encoder-decoder network using the Transformer block. In Uformer,
there are two core designs. First, we introduce a novel locally-enhanced window
(LeWin) Transformer block, which performs nonoverlapping window-based
self-attention instead of global self-attention. It significantly reduces the
computational complexity on high resolution feature map while capturing local
context. Second, we propose a learnable multi-scale restoration modulator in
the form of a multi-scale spatial bias to adjust features in multiple layers of
the Uformer decoder. Our modulator demonstrates superior capability for
restoring details for various image restoration tasks while introducing
marginal extra parameters and computational cost. Powered by these two designs,
Uformer enjoys a high capability for capturing both local and global
dependencies for image restoration. To evaluate our approach, extensive
experiments are conducted on several image restoration tasks, including image
denoising, motion deblurring, defocus deblurring and deraining. Without bells
and whistles, our Uformer achieves superior or comparable performance compared
with the state-of-the-art algorithms. The code and models are available at
https://github.com/ZhendongWang6/Uformer.Comment: 17 pages, 13 figure
Face X-ray for More General Face Forgery Detection
In this paper we propose a novel image representation called face X-ray for
detecting forgery in face images. The face X-ray of an input face image is a
greyscale image that reveals whether the input image can be decomposed into the
blending of two images from different sources. It does so by showing the
blending boundary for a forged image and the absence of blending for a real
image. We observe that most existing face manipulation methods share a common
step: blending the altered face into an existing background image. For this
reason, face X-ray provides an effective way for detecting forgery generated by
most existing face manipulation algorithms. Face X-ray is general in the sense
that it only assumes the existence of a blending step and does not rely on any
knowledge of the artifacts associated with a specific face manipulation
technique. Indeed, the algorithm for computing face X-ray can be trained
without fake images generated by any of the state-of-the-art face manipulation
methods. Extensive experiments show that face X-ray remains effective when
applied to forgery generated by unseen face manipulation techniques, while most
existing face forgery detection or deepfake detection algorithms experience a
significant performance drop.Comment: Accepted to CVPR 2020 (Oral
Improving Person Re-identification with Iterative Impression Aggregation
Our impression about one person often updates after we see more aspects of
him/her and this process keeps iterating given more meetings. We formulate such
an intuition into the problem of person re-identification (re-ID), where the
representation of a query (probe) image is iteratively updated with new
information from the candidates in the gallery. Specifically, we propose a
simple attentional aggregation formulation to instantiate this idea and
showcase that such a pipeline achieves competitive performance on standard
benchmarks including CUHK03, Market-1501 and DukeMTMC. Not only does such a
simple method improve the performance of the baseline models, it also achieves
comparable performance with latest advanced re-ranking methods. Another
advantage of this proposal is its flexibility to incorporate different
representations and similarity metrics. By utilizing stronger representations
and metrics, we further demonstrate state-of-the-art person re-ID performance,
which also validates the general applicability of the proposed method.Comment: Accepted by Transactions on Image Processin
- …