1,757 research outputs found
Face Aging with Contextual Generative Adversarial Nets
Face aging, which renders aging faces for an input face, has attracted
extensive attention in the multimedia research. Recently, several conditional
Generative Adversarial Nets (GANs) based methods have achieved great success.
They can generate images fitting the real face distributions conditioned on
each individual age group. However, these methods fail to capture the
transition patterns, e.g., the gradual shape and texture changes between
adjacent age groups. In this paper, we propose a novel Contextual Generative
Adversarial Nets (C-GANs) to specifically take it into consideration. The
C-GANs consists of a conditional transformation network and two discriminative
networks. The conditional transformation network imitates the aging procedure
with several specially designed residual blocks. The age discriminative network
guides the synthesized face to fit the real conditional distribution. The
transition pattern discriminative network is novel, aiming to distinguish the
real transition patterns with the fake ones. It serves as an extra
regularization term for the conditional transformation network, ensuring the
generated image pairs to fit the corresponding real transition pattern
distribution. Experimental results demonstrate the proposed framework produces
appealing results by comparing with the state-of-the-art and ground truth. We
also observe performance gain for cross-age face verification.Comment: accepted at ACM Multimedia 201
Towards Open-Set Identity Preserving Face Synthesis
We propose a framework based on Generative Adversarial Networks to
disentangle the identity and attributes of faces, such that we can conveniently
recombine different identities and attributes for identity preserving face
synthesis in open domains. Previous identity preserving face synthesis
processes are largely confined to synthesizing faces with known identities that
are already in the training dataset. To synthesize a face with identity outside
the training dataset, our framework requires one input image of that subject to
produce an identity vector, and any other input face image to extract an
attribute vector capturing, e.g., pose, emotion, illumination, and even the
background. We then recombine the identity vector and the attribute vector to
synthesize a new face of the subject with the extracted attribute. Our proposed
framework does not need to annotate the attributes of faces in any way. It is
trained with an asymmetric loss function to better preserve the identity and
stabilize the training process. It can also effectively leverage large amounts
of unlabeled training face images to further improve the fidelity of the
synthesized faces for subjects that are not presented in the labeled training
face dataset. Our experiments demonstrate the efficacy of the proposed
framework. We also present its usage in a much broader set of applications
including face frontalization, face attribute morphing, and face adversarial
example detection
Face Identity Disentanglement via Latent Space Mapping
Learning disentangled representations of data is a fundamental problem in
artificial intelligence. Specifically, disentangled latent representations
allow generative models to control and compose the disentangled factors in the
synthesis process. Current methods, however, require extensive supervision and
training, or instead, noticeably compromise quality. In this paper, we present
a method that learn show to represent data in a disentangled way, with minimal
supervision, manifested solely using available pre-trained networks. Our key
insight is to decouple the processes of disentanglement and synthesis, by
employing a leading pre-trained unconditional image generator, such as
StyleGAN. By learning to map into its latent space, we leverage both its
state-of-the-art quality generative power, and its rich and expressive latent
space, without the burden of training it.We demonstrate our approach on the
complex and high dimensional domain of human heads. We evaluate our method
qualitatively and quantitatively, and exhibit its success with
de-identification operations and with temporal identity coherency in image
sequences. Through this extensive experimentation, we show that our method
successfully disentangles identity from other facial attributes, surpassing
existing methods, even though they require more training and supervision.Comment: 17 pages, 10 figure
DNA-GAN: Learning Disentangled Representations from Multi-Attribute Images
Disentangling factors of variation has become a very challenging problem on
representation learning. Existing algorithms suffer from many limitations, such
as unpredictable disentangling factors, poor quality of generated images from
encodings, lack of identity information, etc. In this paper, we propose a
supervised learning model called DNA-GAN which tries to disentangle different
factors or attributes of images. The latent representations of images are
DNA-like, in which each individual piece (of the encoding) represents an
independent factor of the variation. By annihilating the recessive piece and
swapping a certain piece of one latent representation with that of the other
one, we obtain two different representations which could be decoded into two
kinds of images with the existence of the corresponding attribute being
changed. In order to obtain realistic images and also disentangled
representations, we further introduce the discriminator for adversarial
training. Experiments on Multi-PIE and CelebA datasets finally demonstrate that
our proposed method is effective for factors disentangling and even overcome
certain limitations of the existing methods.Comment: ICLR 2018 workshop, github: https://github.com/Prinsphield/DNA-GA
Deep Illumination: Approximating Dynamic Global Illumination with Generative Adversarial Network
We present Deep Illumination, a novel machine learning technique for
approximating global illumination (GI) in real-time applications using a
Conditional Generative Adversarial Network. Our primary focus is on generating
indirect illumination and soft shadows with offline rendering quality at
interactive rates. Inspired from recent advancement in image-to-image
translation problems using deep generative convolutional networks, we introduce
a variant of this network that learns a mapping from Gbuffers (depth map,
normal map, and diffuse map) and direct illumination to any global illumination
solution. Our primary contribution is showing that a generative model can be
used to learn a density estimation from screen space buffers to an advanced
illumination model for a 3D environment. Once trained, our network can
approximate global illumination for scene configurations it has never
encountered before within the environment it was trained on. We evaluate Deep
Illumination through a comparison with both a state of the art real-time GI
technique (VXGI) and an offline rendering GI technique (path tracing). We show
that our method produces effective GI approximations and is also
computationally cheaper than existing GI techniques. Our technique has the
potential to replace existing precomputed and screen-space techniques for
producing global illumination effects in dynamic scenes with physically-based
rendering quality.Comment: 10 page
Representation Learning by Rotating Your Faces
The large pose discrepancy between two face images is one of the fundamental
challenges in automatic face recognition. Conventional approaches to
pose-invariant face recognition either perform face frontalization on, or learn
a pose-invariant representation from, a non-frontal face image. We argue that
it is more desirable to perform both tasks jointly to allow them to leverage
each other. To this end, this paper proposes a Disentangled Representation
learning-Generative Adversarial Network (DR-GAN) with three distinct novelties.
First, the encoder-decoder structure of the generator enables DR-GAN to learn a
representation that is both generative and discriminative, which can be used
for face image synthesis and pose-invariant face recognition. Second, this
representation is explicitly disentangled from other face variations such as
pose, through the pose code provided to the decoder and pose estimation in the
discriminator. Third, DR-GAN can take one or multiple images as the input, and
generate one unified identity representation along with an arbitrary number of
synthetic face images. Extensive quantitative and qualitative evaluation on a
number of controlled and in-the-wild databases demonstrate the superiority of
DR-GAN over the state of the art in both learning representations and rotating
large-pose face images.Comment: IEEE Transactions on Pattern Analysis and Machine Intelligence
(TPAMI
Longitudinal Face Aging in the Wild - Recent Deep Learning Approaches
Face Aging has raised considerable attentions and interest from the computer
vision community in recent years. Numerous approaches ranging from purely image
processing techniques to deep learning structures have been proposed in
literature. In this paper, we aim to give a review of recent developments of
modern deep learning based approaches, i.e. Deep Generative Models, for Face
Aging task. Their structures, formulation, learning algorithms as well as
synthesized results are also provided with systematic discussions. Moreover,
the aging databases used in most methods to learn the aging process are also
reviewed
Learning Disentangling and Fusing Networks for Face Completion Under Structured Occlusions
Face completion aims to generate semantically new pixels for missing facial
components. It is a challenging generative task due to large variations of face
appearance. This paper studies generative face completion under structured
occlusions. We treat the face completion and corruption as disentangling and
fusing processes of clean faces and occlusions, and propose a jointly
disentangling and fusing Generative Adversarial Network (DF-GAN). First, three
domains are constructed, corresponding to the distributions of occluded faces,
clean faces and structured occlusions. The disentangling and fusing processes
are formulated as the transformations between the three domains. Then the
disentangling and fusing networks are built to learn the transformations from
unpaired data, where the encoder-decoder structure is adopted and allows DF-GAN
to simulate structure occlusions by modifying the latent representations.
Finally, the disentangling and fusing processes are unified into a dual
learning framework along with an adversarial strategy. The proposed method is
evaluated on Meshface verification problem. Experimental results on four
Meshface databases demonstrate the effectiveness of our proposed method for the
face completion under structured occlusions.Comment: Submitted to CVPR 201
Expression Conditional GAN for Facial Expression-to-Expression Translation
In this paper, we focus on the facial expression translation task and propose
a novel Expression Conditional GAN (ECGAN) which can learn the mapping from one
image domain to another one based on an additional expression attribute. The
proposed ECGAN is a generic framework and is applicable to different expression
generation tasks where specific facial expression can be easily controlled by
the conditional attribute label. Besides, we introduce a novel face mask loss
to reduce the influence of background changing. Moreover, we propose an entire
framework for facial expression generation and recognition in the wild, which
consists of two modules, i.e., generation and recognition. Finally, we evaluate
our framework on several public face datasets in which the subjects have
different races, illumination, occlusion, pose, color, content and background
conditions. Even though these datasets are very diverse, both the qualitative
and quantitative results demonstrate that our approach is able to generate
facial expressions accurately and robustly.Comment: 5 pages, 5 figures, accepted to ICIP 201
Adversarial Learning of Disentangled and Generalizable Representations for Visual Attributes
Recently, a multitude of methods for image-to-image translation have
demonstrated impressive results on problems such as multi-domain or
multi-attribute transfer. The vast majority of such works leverages the
strengths of adversarial learning and deep convolutional autoencoders to
achieve realistic results by well-capturing the target data distribution.
Nevertheless, the most prominent representatives of this class of methods do
not facilitate semantic structure in the latent space, and usually rely on
binary domain labels for test-time transfer. This leads to rigid models, unable
to capture the variance of each domain label. In this light, we propose a novel
adversarial learning method that (i) facilitates the emergence of latent
structure by semantically disentangling sources of variation, and (ii)
encourages learning generalizable, continuous, and transferable latent codes
that enable flexible attribute mixing. This is achieved by introducing a novel
loss function that encourages representations to result in uniformly
distributed class posteriors for disentangled attributes. In tandem with an
algorithm for inducing generalizable properties, the resulting representations
can be utilized for a variety of tasks such as intensity-preserving
multi-attribute image translation and synthesis, without requiring labelled
test data. We demonstrate the merits of the proposed method by a set of
qualitative and quantitative experiments on popular databases such as MultiPIE,
RaFD, and BU-3DFE, where our method outperforms other, state-of-the-art methods
in tasks such as intensity-preserving multi-attribute transfer and synthesis
- …