11,956 research outputs found
Reconstruction of 3D faces by shape estimation and texture interpolation
This paper aims to address the ill-posed problem of reconstructing 3D faces from single 2D face images. An
extended Tikhonov regularization method is connected with the standard 3D morphable model in order to
reconstruct the 3D face shapes from a small set of 2D facial points. Further, by interpolating the input 2D
texture with the model texture and warping the interpolated texture to the reconstructed face shapes, 3D face
reconstruction is achieved. For the texture warping, the 2D face deformation has been learned from the model
texture using a set of facial landmarks. Our experimental results justify the robustness of the proposed approach
with respect to the reconstruction of realistic 3D face shapes
Expressive Body Capture: 3D Hands, Face, and Body from a Single Image
To facilitate the analysis of human actions, interactions and emotions, we
compute a 3D model of human body pose, hand pose, and facial expression from a
single monocular image. To achieve this, we use thousands of 3D scans to train
a new, unified, 3D model of the human body, SMPL-X, that extends SMPL with
fully articulated hands and an expressive face. Learning to regress the
parameters of SMPL-X directly from images is challenging without paired images
and 3D ground truth. Consequently, we follow the approach of SMPLify, which
estimates 2D features and then optimizes model parameters to fit the features.
We improve on SMPLify in several significant ways: (1) we detect 2D features
corresponding to the face, hands, and feet and fit the full SMPL-X model to
these; (2) we train a new neural network pose prior using a large MoCap
dataset; (3) we define a new interpenetration penalty that is both fast and
accurate; (4) we automatically detect gender and the appropriate body models
(male, female, or neutral); (5) our PyTorch implementation achieves a speedup
of more than 8x over Chumpy. We use the new method, SMPLify-X, to fit SMPL-X to
both controlled images and images in the wild. We evaluate 3D accuracy on a new
curated dataset comprising 100 images with pseudo ground-truth. This is a step
towards automatic expressive human capture from monocular RGB data. The models,
code, and data are available for research purposes at
https://smpl-x.is.tue.mpg.de.Comment: To appear in CVPR 201
Visual ageing of human faces in three dimensions using morphable models and projection to latent structures
We present an approach to synthesising the effects of ageing on human face images using three-dimensional modelling. We extract a set of three dimensional face models from a set of two-dimensional face images by fitting a Morphable Model. We propose a method to age these face models using Partial Least Squares to extract from the data-set those factors most related to ageing. These ageing related factors are used to train an individually weighted linear model. We show that this is an effective means of producing an aged face image and compare this method to two other linear ageing methods for ageing face models. This is demonstrated both quantitatively and with perceptual evaluation using human raters.Postprin
MoFA: Model-based Deep Convolutional Face Autoencoder for Unsupervised Monocular Reconstruction
In this work we propose a novel model-based deep convolutional autoencoder
that addresses the highly challenging problem of reconstructing a 3D human face
from a single in-the-wild color image. To this end, we combine a convolutional
encoder network with an expert-designed generative model that serves as
decoder. The core innovation is our new differentiable parametric decoder that
encapsulates image formation analytically based on a generative model. Our
decoder takes as input a code vector with exactly defined semantic meaning that
encodes detailed face pose, shape, expression, skin reflectance and scene
illumination. Due to this new way of combining CNN-based with model-based face
reconstruction, the CNN-based encoder learns to extract semantically meaningful
parameters from a single monocular input image. For the first time, a CNN
encoder and an expert-designed generative model can be trained end-to-end in an
unsupervised manner, which renders training on very large (unlabeled) real
world data feasible. The obtained reconstructions compare favorably to current
state-of-the-art approaches in terms of quality and richness of representation.Comment: International Conference on Computer Vision (ICCV) 2017 (Oral), 13
page
FML: Face Model Learning from Videos
Monocular image-based 3D reconstruction of faces is a long-standing problem
in computer vision. Since image data is a 2D projection of a 3D face, the
resulting depth ambiguity makes the problem ill-posed. Most existing methods
rely on data-driven priors that are built from limited 3D face scans. In
contrast, we propose multi-frame video-based self-supervised training of a deep
network that (i) learns a face identity model both in shape and appearance
while (ii) jointly learning to reconstruct 3D faces. Our face model is learned
using only corpora of in-the-wild video clips collected from the Internet. This
virtually endless source of training data enables learning of a highly general
3D face model. In order to achieve this, we propose a novel multi-frame
consistency loss that ensures consistent shape and appearance across multiple
frames of a subject's face, thus minimizing depth ambiguity. At test time we
can use an arbitrary number of frames, so that we can perform both monocular as
well as multi-frame reconstruction.Comment: CVPR 2019 (Oral). Video: https://www.youtube.com/watch?v=SG2BwxCw0lQ,
Project Page: https://gvv.mpi-inf.mpg.de/projects/FML19
A Decoupled 3D Facial Shape Model by Adversarial Training
Data-driven generative 3D face models are used to compactly encode facial
shape data into meaningful parametric representations. A desirable property of
these models is their ability to effectively decouple natural sources of
variation, in particular identity and expression. While factorized
representations have been proposed for that purpose, they are still limited in
the variability they can capture and may present modeling artifacts when
applied to tasks such as expression transfer. In this work, we explore a new
direction with Generative Adversarial Networks and show that they contribute to
better face modeling performances, especially in decoupling natural factors,
while also achieving more diverse samples. To train the model we introduce a
novel architecture that combines a 3D generator with a 2D discriminator that
leverages conventional CNNs, where the two components are bridged by a geometry
mapping layer. We further present a training scheme, based on auxiliary
classifiers, to explicitly disentangle identity and expression attributes.
Through quantitative and qualitative results on standard face datasets, we
illustrate the benefits of our model and demonstrate that it outperforms
competing state of the art methods in terms of decoupling and diversity.Comment: camera-ready version for ICCV'1
- …