2,098 research outputs found
Global-Local Face Upsampling Network
Face hallucination, which is the task of generating a high-resolution face
image from a low-resolution input image, is a well-studied problem that is
useful in widespread application areas. Face hallucination is particularly
challenging when the input face resolution is very low (e.g., 10 x 12 pixels)
and/or the image is captured in an uncontrolled setting with large pose and
illumination variations. In this paper, we revisit the algorithm introduced in
[1] and present a deep interpretation of this framework that achieves
state-of-the-art under such challenging scenarios. In our deep network
architecture the global and local constraints that define a face can be
efficiently modeled and learned end-to-end using training data. Conceptually
our network design can be partitioned into two sub-networks: the first one
implements the holistic face reconstruction according to global constraints,
and the second one enhances face-specific details and enforces local patch
statistics. We optimize the deep network using a new loss function for
super-resolution that combines reconstruction error with a learned face quality
measure in adversarial setting, producing improved visual results. We conduct
extensive experiments in both controlled and uncontrolled setups and show that
our algorithm improves the state of the art both numerically and visually
Towards Realistic Face Photo-Sketch Synthesis via Composition-Aided GANs
Face photo-sketch synthesis aims at generating a facial sketch/photo
conditioned on a given photo/sketch. It is of wide applications including
digital entertainment and law enforcement. Precisely depicting face
photos/sketches remains challenging due to the restrictions on structural
realism and textural consistency. While existing methods achieve compelling
results, they mostly yield blurred effects and great deformation over various
facial components, leading to the unrealistic feeling of synthesized images. To
tackle this challenge, in this work, we propose to use the facial composition
information to help the synthesis of face sketch/photo. Specially, we propose a
novel composition-aided generative adversarial network (CA-GAN) for face
photo-sketch synthesis. In CA-GAN, we utilize paired inputs including a face
photo/sketch and the corresponding pixel-wise face labels for generating a
sketch/photo. In addition, to focus training on hard-generated components and
delicate facial structures, we propose a compositional reconstruction loss.
Finally, we use stacked CA-GANs (SCA-GAN) to further rectify defects and add
compelling details. Experimental results show that our method is capable of
generating both visually comfortable and identity-preserving face
sketches/photos over a wide range of challenging data. Our method achieves the
state-of-the-art quality, reducing best previous Frechet Inception distance
(FID) by a large margin. Besides, we demonstrate that the proposed method is of
considerable generalization ability. We have made our code and results publicly
available: https://fei-hdu.github.io/ca-gan/.Comment: 10 pages, 8 figures, journa
Face Video Generation from a Single Image and Landmarks
In this paper we are concerned with the challenging problem of producing a
full image sequence of a deformable face given only an image and generic facial
motions encoded by a set of sparse landmarks. To this end we build upon recent
breakthroughs in image-to-image translation such as pix2pix, CycleGAN and
StarGAN which learn Deep Convolutional Neural Networks (DCNNs) that learn to
map aligned pairs or images between different domains (i.e., having different
labels) and propose a new architecture which is not driven any more by labels
but by spatial maps, facial landmarks. In particular, we propose the MotionGAN
which transforms an input face image into a new one according to a heatmap of
target landmarks. We show that it is possible to create very realistic face
videos using a single image and a set of target landmarks. Furthermore, our
method can be used to edit a facial image with arbitrary motions according to
landmarks (e.g., expression, speech, etc.). This provides much more flexibility
to face editing, expression transfer, facial video creation, etc. than models
based on discrete expressions, audios or action units
Facial Aging and Rejuvenation by Conditional Multi-Adversarial Autoencoder with Ordinal Regression
Facial aging and facial rejuvenation analyze a given face photograph to
predict a future look or estimate a past look of the person. To achieve this,
it is critical to preserve human identity and the corresponding aging
progression and regression with high accuracy. However, existing methods cannot
simultaneously handle these two objectives well. We propose a novel generative
adversarial network based approach, named the Conditional Multi-Adversarial
AutoEncoder with Ordinal Regression (CMAAE-OR). It utilizes an age estimation
technique to control the aging accuracy and takes a high-level feature
representation to preserve personalized identity. Specifically, the face is
first mapped to a latent vector through a convolutional encoder. The latent
vector is then projected onto the face manifold conditional on the age through
a deconvolutional generator. The latent vector preserves personalized face
features and the age controls facial aging and rejuvenation. A discriminator
and an ordinal regression are imposed on the encoder and the generator in
tandem, making the generated face images to be more photorealistic while
simultaneously exhibiting desirable aging effects. Besides, a high-level
feature representation is utilized to preserve personalized identity of the
generated face. Experiments on two benchmark datasets demonstrate appealing
performance of the proposed method over the state-of-the-art
Face Identity Disentanglement via Latent Space Mapping
Learning disentangled representations of data is a fundamental problem in
artificial intelligence. Specifically, disentangled latent representations
allow generative models to control and compose the disentangled factors in the
synthesis process. Current methods, however, require extensive supervision and
training, or instead, noticeably compromise quality. In this paper, we present
a method that learn show to represent data in a disentangled way, with minimal
supervision, manifested solely using available pre-trained networks. Our key
insight is to decouple the processes of disentanglement and synthesis, by
employing a leading pre-trained unconditional image generator, such as
StyleGAN. By learning to map into its latent space, we leverage both its
state-of-the-art quality generative power, and its rich and expressive latent
space, without the burden of training it.We demonstrate our approach on the
complex and high dimensional domain of human heads. We evaluate our method
qualitatively and quantitatively, and exhibit its success with
de-identification operations and with temporal identity coherency in image
sequences. Through this extensive experimentation, we show that our method
successfully disentangles identity from other facial attributes, surpassing
existing methods, even though they require more training and supervision.Comment: 17 pages, 10 figure
Longitudinal Face Aging in the Wild - Recent Deep Learning Approaches
Face Aging has raised considerable attentions and interest from the computer
vision community in recent years. Numerous approaches ranging from purely image
processing techniques to deep learning structures have been proposed in
literature. In this paper, we aim to give a review of recent developments of
modern deep learning based approaches, i.e. Deep Generative Models, for Face
Aging task. Their structures, formulation, learning algorithms as well as
synthesized results are also provided with systematic discussions. Moreover,
the aging databases used in most methods to learn the aging process are also
reviewed
PortraitGAN for Flexible Portrait Manipulation
Previous methods have dealt with discrete manipulation of facial attributes
such as smile, sad, angry, surprise etc, out of canonical expressions and they
are not scalable, operating in single modality. In this paper, we propose a
novel framework that supports continuous edits and multi-modality portrait
manipulation using adversarial learning. Specifically, we adapt
cycle-consistency into the conditional setting by leveraging additional facial
landmarks information. This has two effects: first cycle mapping induces
bidirectional manipulation and identity preserving; second pairing samples from
different modalities can thus be utilized. To ensure high-quality synthesis, we
adopt texture-loss that enforces texture consistency and multi-level
adversarial supervision that facilitates gradient flow. Quantitative and
qualitative experiments show the effectiveness of our framework in performing
flexible and multi-modality portrait manipulation with photo-realistic effects
Deep Facial Expression Recognition: A Survey
With the transition of facial expression recognition (FER) from
laboratory-controlled to challenging in-the-wild conditions and the recent
success of deep learning techniques in various fields, deep neural networks
have increasingly been leveraged to learn discriminative representations for
automatic FER. Recent deep FER systems generally focus on two important issues:
overfitting caused by a lack of sufficient training data and
expression-unrelated variations, such as illumination, head pose and identity
bias. In this paper, we provide a comprehensive survey on deep FER, including
datasets and algorithms that provide insights into these intrinsic problems.
First, we describe the standard pipeline of a deep FER system with the related
background knowledge and suggestions of applicable implementations for each
stage. We then introduce the available datasets that are widely used in the
literature and provide accepted data selection and evaluation principles for
these datasets. For the state of the art in deep FER, we review existing novel
deep neural networks and related training strategies that are designed for FER
based on both static images and dynamic image sequences, and discuss their
advantages and limitations. Competitive performances on widely used benchmarks
are also summarized in this section. We then extend our survey to additional
related issues and application scenarios. Finally, we review the remaining
challenges and corresponding opportunities in this field as well as future
directions for the design of robust deep FER systems
Anti-Makeup: Learning A Bi-Level Adversarial Network for Makeup-Invariant Face Verification
Makeup is widely used to improve facial attractiveness and is well accepted
by the public. However, different makeup styles will result in significant
facial appearance changes. It remains a challenging problem to match makeup and
non-makeup face images. This paper proposes a learning from generation approach
for makeup-invariant face verification by introducing a bi-level adversarial
network (BLAN). To alleviate the negative effects from makeup, we first
generate non-makeup images from makeup ones, and then use the synthesized
non-makeup images for further verification. Two adversarial networks in BLAN
are integrated in an end-to-end deep network, with the one on pixel level for
reconstructing appealing facial images and the other on feature level for
preserving identity information. These two networks jointly reduce the sensing
gap between makeup and non-makeup images. Moreover, we make the generator well
constrained by incorporating multiple perceptual losses. Experimental results
on three benchmark makeup face datasets demonstrate that our method achieves
state-of-the-art verification accuracy across makeup status and can produce
photo-realistic non-makeup face images.Comment: The paper is accepted by AAAI-1
UVA: A Universal Variational Framework for Continuous Age Analysis
Conventional methods for facial age analysis tend to utilize accurate age
labels in a supervised way. However, existing age datasets lies in a limited
range of ages, leading to a long-tailed distribution. To alleviate the problem,
this paper proposes a Universal Variational Aging (UVA) framework to formulate
facial age priors in a disentangling manner. Benefiting from the variational
evidence lower bound, the facial images are encoded and disentangled into an
age-irrelevant distribution and an age-related distribution in the latent
space. A conditional introspective adversarial learning mechanism is introduced
to boost the image quality. In this way, when manipulating the age-related
distribution, UVA can achieve age translation with arbitrary ages. Further, by
sampling noise from the age-irrelevant distribution, we can generate
photorealistic facial images with a specific age. Moreover, given an input face
image, the mean value of age-related distribution can be treated as an age
estimator. These indicate that UVA can efficiently and accurately estimate the
age-related distribution by a disentangling manner, even if the training
dataset performs a long-tailed age distribution. UVA is the first attempt to
achieve facial age analysis tasks, including age translation, age generation
and age estimation, in a universal framework. The qualitative and quantitative
experiments demonstrate the superiority of UVA on five popular datasets,
including CACD2000, Morph, UTKFace, MegaAge-Asian and FG-NET
- …