564 research outputs found
Interpretable Transformations with Encoder-Decoder Networks
Deep feature spaces have the capacity to encode complex transformations of
their input data. However, understanding the relative feature-space
relationship between two transformed encoded images is difficult. For instance,
what is the relative feature space relationship between two rotated images?
What is decoded when we interpolate in feature space? Ideally, we want to
disentangle confounding factors, such as pose, appearance, and illumination,
from object identity. Disentangling these is difficult because they interact in
very nonlinear ways. We propose a simple method to construct a deep feature
space, with explicitly disentangled representations of several known
transformations. A person or algorithm can then manipulate the disentangled
representation, for example, to re-render an image with explicit control over
parameterized degrees of freedom. The feature space is constructed using a
transforming encoder-decoder network with a custom feature transform layer,
acting on the hidden representations. We demonstrate the advantages of explicit
disentangling on a variety of datasets and transformations, and as an aid for
traditional tasks, such as classification.Comment: Accepted at ICCV 201
Age Progression/Regression by Conditional Adversarial Autoencoder
"If I provide you a face image of mine (without telling you the actual age
when I took the picture) and a large amount of face images that I crawled
(containing labeled faces of different ages but not necessarily paired), can
you show me what I would look like when I am 80 or what I was like when I was
5?" The answer is probably a "No." Most existing face aging works attempt to
learn the transformation between age groups and thus would require the paired
samples as well as the labeled query image. In this paper, we look at the
problem from a generative modeling perspective such that no paired samples is
required. In addition, given an unlabeled image, the generative model can
directly produce the image with desired age attribute. We propose a conditional
adversarial autoencoder (CAAE) that learns a face manifold, traversing on which
smooth age progression and regression can be realized simultaneously. In CAAE,
the face is first mapped to a latent vector through a convolutional encoder,
and then the vector is projected to the face manifold conditional on age
through a deconvolutional generator. The latent vector preserves personalized
face features (i.e., personality) and the age condition controls progression
vs. regression. Two adversarial networks are imposed on the encoder and
generator, respectively, forcing to generate more photo-realistic faces.
Experimental results demonstrate the appealing performance and flexibility of
the proposed framework by comparing with the state-of-the-art and ground truth.Comment: Accepted by The IEEE Conference on Computer Vision and Pattern
Recognition (CVPR 2017
Self-supervised Multi-level Face Model Learning for Monocular Reconstruction at over 250 Hz
The reconstruction of dense 3D models of face geometry and appearance from a
single image is highly challenging and ill-posed. To constrain the problem,
many approaches rely on strong priors, such as parametric face models learned
from limited 3D scan data. However, prior models restrict generalization of the
true diversity in facial geometry, skin reflectance and illumination. To
alleviate this problem, we present the first approach that jointly learns 1) a
regressor for face shape, expression, reflectance and illumination on the basis
of 2) a concurrently learned parametric face model. Our multi-level face model
combines the advantage of 3D Morphable Models for regularization with the
out-of-space generalization of a learned corrective space. We train end-to-end
on in-the-wild images without dense annotations by fusing a convolutional
encoder with a differentiable expert-designed renderer and a self-supervised
training loss, both defined at multiple detail levels. Our approach compares
favorably to the state-of-the-art in terms of reconstruction quality, better
generalizes to real world faces, and runs at over 250 Hz.Comment: CVPR 2018 (Oral). Project webpage:
https://gvv.mpi-inf.mpg.de/projects/FML
- …