486 research outputs found
3D Face Tracking and Texture Fusion in the Wild
We present a fully automatic approach to real-time 3D face reconstruction
from monocular in-the-wild videos. With the use of a cascaded-regressor based
face tracking and a 3D Morphable Face Model shape fitting, we obtain a
semi-dense 3D face shape. We further use the texture information from multiple
frames to build a holistic 3D face representation from the video frames. Our
system is able to capture facial expressions and does not require any
person-specific training. We demonstrate the robustness of our approach on the
challenging 300 Videos in the Wild (300-VW) dataset. Our real-time fitting
framework is available as an open source library at http://4dface.org
Unsupervised Training for 3D Morphable Model Regression
We present a method for training a regression network from image pixels to 3D
morphable model coordinates using only unlabeled photographs. The training loss
is based on features from a facial recognition network, computed on-the-fly by
rendering the predicted faces with a differentiable renderer. To make training
from features feasible and avoid network fooling effects, we introduce three
objectives: a batch distribution loss that encourages the output distribution
to match the distribution of the morphable model, a loopback loss that ensures
the network can correctly reinterpret its own output, and a multi-view identity
loss that compares the features of the predicted 3D face and the input
photograph from multiple viewing angles. We train a regression network using
these objectives, a set of unlabeled photographs, and the morphable model
itself, and demonstrate state-of-the-art results.Comment: CVPR 2018 version with supplemental material
(http://openaccess.thecvf.com/content_cvpr_2018/html/Genova_Unsupervised_Training_for_CVPR_2018_paper.html
3D Face Reconstruction from Light Field Images: A Model-free Approach
Reconstructing 3D facial geometry from a single RGB image has recently
instigated wide research interest. However, it is still an ill-posed problem
and most methods rely on prior models hence undermining the accuracy of the
recovered 3D faces. In this paper, we exploit the Epipolar Plane Images (EPI)
obtained from light field cameras and learn CNN models that recover horizontal
and vertical 3D facial curves from the respective horizontal and vertical EPIs.
Our 3D face reconstruction network (FaceLFnet) comprises a densely connected
architecture to learn accurate 3D facial curves from low resolution EPIs. To
train the proposed FaceLFnets from scratch, we synthesize photo-realistic light
field images from 3D facial scans. The curve by curve 3D face estimation
approach allows the networks to learn from only 14K images of 80 identities,
which still comprises over 11 Million EPIs/curves. The estimated facial curves
are merged into a single pointcloud to which a surface is fitted to get the
final 3D face. Our method is model-free, requires only a few training samples
to learn FaceLFnet and can reconstruct 3D faces with high accuracy from single
light field images under varying poses, expressions and lighting conditions.
Comparison on the BU-3DFE and BU-4DFE datasets show that our method reduces
reconstruction errors by over 20% compared to recent state of the art
FML: Face Model Learning from Videos
Monocular image-based 3D reconstruction of faces is a long-standing problem
in computer vision. Since image data is a 2D projection of a 3D face, the
resulting depth ambiguity makes the problem ill-posed. Most existing methods
rely on data-driven priors that are built from limited 3D face scans. In
contrast, we propose multi-frame video-based self-supervised training of a deep
network that (i) learns a face identity model both in shape and appearance
while (ii) jointly learning to reconstruct 3D faces. Our face model is learned
using only corpora of in-the-wild video clips collected from the Internet. This
virtually endless source of training data enables learning of a highly general
3D face model. In order to achieve this, we propose a novel multi-frame
consistency loss that ensures consistent shape and appearance across multiple
frames of a subject's face, thus minimizing depth ambiguity. At test time we
can use an arbitrary number of frames, so that we can perform both monocular as
well as multi-frame reconstruction.Comment: CVPR 2019 (Oral). Video: https://www.youtube.com/watch?v=SG2BwxCw0lQ,
Project Page: https://gvv.mpi-inf.mpg.de/projects/FML19
Multilinear Wavelets: A Statistical Shape Space for Human Faces
We present a statistical model for D human faces in varying expression,
which decomposes the surface of the face using a wavelet transform, and learns
many localized, decorrelated multilinear models on the resulting coefficients.
Using this model we are able to reconstruct faces from noisy and occluded D
face scans, and facial motion sequences. Accurate reconstruction of face shape
is important for applications such as tele-presence and gaming. The localized
and multi-scale nature of our model allows for recovery of fine-scale detail
while retaining robustness to severe noise and occlusion, and is
computationally efficient and scalable. We validate these properties
experimentally on challenging data in the form of static scans and motion
sequences. We show that in comparison to a global multilinear model, our model
better preserves fine detail and is computationally faster, while in comparison
to a localized PCA model, our model better handles variation in expression, is
faster, and allows us to fix identity parameters for a given subject.Comment: 10 pages, 7 figures; accepted to ECCV 201
Combining Dense Nonrigid Structure from Motion and 3D Morphable Models for Monocular 4D Face Reconstruction
This is the author accepted manuscript. The final version is available from ACM via the DOI in this record Monocular 4D face reconstruction is a challenging problem, especially in the case that the input video is captured under unconstrained conditions, i.e. "in the wild". The majority of the state-of-the-art approaches build upon 3D Morphable Modelling (3DMM), which has been proven to be more robust than model-free approaches such as Shape from Shading (SfS) or Structure from Motion (SfM). While offering visually plausible shape reconstruction results that resemble real faces, 3DMMs adhere to the model space learned from exemplar faces during the training phase, often yielding facial reconstructions that are excessively smooth and look too similar even across captured faces with completely different facial characteristics. This is due to the fact that 3DMMs are typically used as hard constraints on the reconstructed 3D shape. To overcome these limitations, in this paper we propose to combine 3DMMs with Dense Nonrigid Structure from Motion (DNSM), which is much less robust but has the potential of reconstructing fine details and capturing the subject-specific facial characteristics of every input. We effectively combine the best of both worlds by introducing a novel dense variational framework, which we solve efficiently by designing a convex optimisation strategy. In contrast to previous methods, we incorporate 3DMM as a soft constraint, penalizing both departure of reconstructed faces from the 3DMM subspace and variation of the identity component of the 3DMM over different frames of the input video. As demonstrated in qualitative and quantitative experiments, our method is robust, accurately estimates the 3D facial shape over time and outperforms other state-of-the-art methods of 4D face reconstruction
- …