141,511 research outputs found
CNN-based Real-time Dense Face Reconstruction with Inverse-rendered Photo-realistic Face Images
With the powerfulness of convolution neural networks (CNN), CNN based face
reconstruction has recently shown promising performance in reconstructing
detailed face shape from 2D face images. The success of CNN-based methods
relies on a large number of labeled data. The state-of-the-art synthesizes such
data using a coarse morphable face model, which however has difficulty to
generate detailed photo-realistic images of faces (with wrinkles). This paper
presents a novel face data generation method. Specifically, we render a large
number of photo-realistic face images with different attributes based on
inverse rendering. Furthermore, we construct a fine-detailed face image dataset
by transferring different scales of details from one image to another. We also
construct a large number of video-type adjacent frame pairs by simulating the
distribution of real video data. With these nicely constructed datasets, we
propose a coarse-to-fine learning framework consisting of three convolutional
networks. The networks are trained for real-time detailed 3D face
reconstruction from monocular video as well as from a single image. Extensive
experimental results demonstrate that our framework can produce high-quality
reconstruction but with much less computation time compared to the
state-of-the-art. Moreover, our method is robust to pose, expression and
lighting due to the diversity of data.Comment: Accepted by IEEE Transactions on Pattern Analysis and Machine
Intelligence, 201
Self-supervised Learning of Detailed 3D Face Reconstruction
In this paper, we present an end-to-end learning framework for detailed 3D
face reconstruction from a single image. Our approach uses a 3DMM-based coarse
model and a displacement map in UV-space to represent a 3D face. Unlike
previous work addressing the problem, our learning framework does not require
supervision of surrogate ground-truth 3D models computed with traditional
approaches. Instead, we utilize the input image itself as supervision during
learning. In the first stage, we combine a photometric loss and a facial
perceptual loss between the input face and the rendered face, to regress a
3DMM-based coarse model. In the second stage, both the input image and the
regressed texture of the coarse model are unwrapped into UV-space, and then
sent through an image-toimage translation network to predict a displacement map
in UVspace. The displacement map and the coarse model are used to render a
final detailed face, which again can be compared with the original input image
to serve as a photometric loss for the second stage. The advantage of learning
displacement map in UV-space is that face alignment can be explicitly done
during the unwrapping, thus facial details are easier to learn from large
amount of data. Extensive experiments demonstrate the superiority of the
proposed method over previous work.Comment: Accepted by IEEE Transactions on Image Processing (TIP
MoFA: Model-based Deep Convolutional Face Autoencoder for Unsupervised Monocular Reconstruction
In this work we propose a novel model-based deep convolutional autoencoder
that addresses the highly challenging problem of reconstructing a 3D human face
from a single in-the-wild color image. To this end, we combine a convolutional
encoder network with an expert-designed generative model that serves as
decoder. The core innovation is our new differentiable parametric decoder that
encapsulates image formation analytically based on a generative model. Our
decoder takes as input a code vector with exactly defined semantic meaning that
encodes detailed face pose, shape, expression, skin reflectance and scene
illumination. Due to this new way of combining CNN-based with model-based face
reconstruction, the CNN-based encoder learns to extract semantically meaningful
parameters from a single monocular input image. For the first time, a CNN
encoder and an expert-designed generative model can be trained end-to-end in an
unsupervised manner, which renders training on very large (unlabeled) real
world data feasible. The obtained reconstructions compare favorably to current
state-of-the-art approaches in terms of quality and richness of representation.Comment: International Conference on Computer Vision (ICCV) 2017 (Oral), 13
page
Extreme 3D Face Reconstruction: Seeing Through Occlusions
Existing single view, 3D face reconstruction methods can produce beautifully
detailed 3D results, but typically only for near frontal, unobstructed
viewpoints. We describe a system designed to provide detailed 3D
reconstructions of faces viewed under extreme conditions, out of plane
rotations, and occlusions. Motivated by the concept of bump mapping, we propose
a layered approach which decouples estimation of a global shape from its
mid-level details (e.g., wrinkles). We estimate a coarse 3D face shape which
acts as a foundation and then separately layer this foundation with details
represented by a bump map. We show how a deep convolutional encoder-decoder can
be used to estimate such bump maps. We further show how this approach naturally
extends to generate plausible details for occluded facial regions. We test our
approach and its components extensively, quantitatively demonstrating the
invariance of our estimated facial details. We further provide numerous
qualitative examples showing that our method produces detailed 3D face shapes
in viewing conditions where existing state of the art often break down.Comment: Accepted to CVPR'18. Previously titled: "Extreme 3D Face
Reconstruction: Looking Past Occlusions
Tex2Shape: Detailed Full Human Body Geometry From a Single Image
We present a simple yet effective method to infer detailed full human body shape from only a single photograph. Our model can infer full-body shape including face, hair, and clothing including wrinkles at interactive frame-rates. Results feature details even on parts that are occluded in the input image. Our main idea is to turn shape regression into an aligned image-to-image translation problem. The input to our method is a partial texture map of the visible region obtained from off-the-shelf methods. From a partial texture, we estimate detailed normal and vector displacement maps, which can be applied to a low-resolution smooth body model to add detail and clothing. Despite being trained purely with synthetic data, our model generalizes well to real-world photographs. Numerous results demonstrate the versatility and robustness of our method
Tex2Shape: Detailed Full Human Body Geometry From a Single Image
We present a simple yet effective method to infer detailed full human body
shape from only a single photograph. Our model can infer full-body shape
including face, hair, and clothing including wrinkles at interactive
frame-rates. Results feature details even on parts that are occluded in the
input image. Our main idea is to turn shape regression into an aligned
image-to-image translation problem. The input to our method is a partial
texture map of the visible region obtained from off-the-shelf methods. From a
partial texture, we estimate detailed normal and vector displacement maps,
which can be applied to a low-resolution smooth body model to add detail and
clothing. Despite being trained purely with synthetic data, our model
generalizes well to real-world photographs. Numerous results demonstrate the
versatility and robustness of our method
3D Face Reconstruction by Learning from Synthetic Data
Fast and robust three-dimensional reconstruction of facial geometric
structure from a single image is a challenging task with numerous applications.
Here, we introduce a learning-based approach for reconstructing a
three-dimensional face from a single image. Recent face recovery methods rely
on accurate localization of key characteristic points. In contrast, the
proposed approach is based on a Convolutional-Neural-Network (CNN) which
extracts the face geometry directly from its image. Although such deep
architectures outperform other models in complex computer vision problems,
training them properly requires a large dataset of annotated examples. In the
case of three-dimensional faces, currently, there are no large volume data
sets, while acquiring such big-data is a tedious task. As an alternative, we
propose to generate random, yet nearly photo-realistic, facial images for which
the geometric form is known. The suggested model successfully recovers facial
shapes from real images, even for faces with extreme expressions and under
various lighting conditions.Comment: The first two authors contributed equally to this wor
Towards High-Fidelity 3D Face Reconstruction from In-the-Wild Images Using Graph Convolutional Networks
3D Morphable Model (3DMM) based methods have achieved great success in
recovering 3D face shapes from single-view images. However, the facial textures
recovered by such methods lack the fidelity as exhibited in the input images.
Recent work demonstrates high-quality facial texture recovering with generative
networks trained from a large-scale database of high-resolution UV maps of face
textures, which is hard to prepare and not publicly available. In this paper,
we introduce a method to reconstruct 3D facial shapes with high-fidelity
textures from single-view images in-the-wild, without the need to capture a
large-scale face texture database. The main idea is to refine the initial
texture generated by a 3DMM based method with facial details from the input
image. To this end, we propose to use graph convolutional networks to
reconstruct the detailed colors for the mesh vertices instead of reconstructing
the UV map. Experiments show that our method can generate high-quality results
and outperforms state-of-the-art methods in both qualitative and quantitative
comparisons.Comment: Accepted to CVPR 2020. The source code is available at
https://github.com/FuxiCV/3D-Face-GCN
FML: Face Model Learning from Videos
Monocular image-based 3D reconstruction of faces is a long-standing problem
in computer vision. Since image data is a 2D projection of a 3D face, the
resulting depth ambiguity makes the problem ill-posed. Most existing methods
rely on data-driven priors that are built from limited 3D face scans. In
contrast, we propose multi-frame video-based self-supervised training of a deep
network that (i) learns a face identity model both in shape and appearance
while (ii) jointly learning to reconstruct 3D faces. Our face model is learned
using only corpora of in-the-wild video clips collected from the Internet. This
virtually endless source of training data enables learning of a highly general
3D face model. In order to achieve this, we propose a novel multi-frame
consistency loss that ensures consistent shape and appearance across multiple
frames of a subject's face, thus minimizing depth ambiguity. At test time we
can use an arbitrary number of frames, so that we can perform both monocular as
well as multi-frame reconstruction.Comment: CVPR 2019 (Oral). Video: https://www.youtube.com/watch?v=SG2BwxCw0lQ,
Project Page: https://gvv.mpi-inf.mpg.de/projects/FML19
- …