2,313 research outputs found
Dense 3D Face Decoding over 2500FPS: Joint Texture & Shape Convolutional Mesh Decoders
3D Morphable Models (3DMMs) are statistical models that represent facial
texture and shape variations using a set of linear bases and more particular
Principal Component Analysis (PCA). 3DMMs were used as statistical priors for
reconstructing 3D faces from images by solving non-linear least square
optimization problems. Recently, 3DMMs were used as generative models for
training non-linear mappings (\ie, regressors) from image to the parameters of
the models via Deep Convolutional Neural Networks (DCNNs). Nevertheless, all of
the above methods use either fully connected layers or 2D convolutions on
parametric unwrapped UV spaces leading to large networks with many parameters.
In this paper, we present the first, to the best of our knowledge, non-linear
3DMMs by learning joint texture and shape auto-encoders using direct mesh
convolutions. We demonstrate how these auto-encoders can be used to train very
light-weight models that perform Coloured Mesh Decoding (CMD) in-the-wild at a
speed of over 2500 FPS
MeshCNN: A Network with an Edge
Polygonal meshes provide an efficient representation for 3D shapes. They
explicitly capture both shape surface and topology, and leverage non-uniformity
to represent large flat regions as well as sharp, intricate features. This
non-uniformity and irregularity, however, inhibits mesh analysis efforts using
neural networks that combine convolution and pooling operations. In this paper,
we utilize the unique properties of the mesh for a direct analysis of 3D shapes
using MeshCNN, a convolutional neural network designed specifically for
triangular meshes. Analogous to classic CNNs, MeshCNN combines specialized
convolution and pooling layers that operate on the mesh edges, by leveraging
their intrinsic geodesic connections. Convolutions are applied on edges and the
four edges of their incident triangles, and pooling is applied via an edge
collapse operation that retains surface topology, thereby, generating new mesh
connectivity for the subsequent convolutions. MeshCNN learns which edges to
collapse, thus forming a task-driven process where the network exposes and
expands the important features while discarding the redundant ones. We
demonstrate the effectiveness of our task-driven pooling on various learning
tasks applied to 3D meshes.Comment: For a two-minute explanation video see https://bit.ly/meshcnnvide
Deformable Shape Completion with Graph Convolutional Autoencoders
The availability of affordable and portable depth sensors has made scanning
objects and people simpler than ever. However, dealing with occlusions and
missing parts is still a significant challenge. The problem of reconstructing a
(possibly non-rigidly moving) 3D object from a single or multiple partial scans
has received increasing attention in recent years. In this work, we propose a
novel learning-based method for the completion of partial shapes. Unlike the
majority of existing approaches, our method focuses on objects that can undergo
non-rigid deformations. The core of our method is a variational autoencoder
with graph convolutional operations that learns a latent space for complete
realistic shapes. At inference, we optimize to find the representation in this
latent space that best fits the generated shape to the known partial input. The
completed shape exhibits a realistic appearance on the unknown part. We show
promising results towards the completion of synthetic and real scans of human
body and face meshes exhibiting different styles of articulation and
partiality.Comment: CVPR 201
Self-supervised CNN for Unconstrained 3D Facial Performance Capture from an RGB-D Camera
We present a novel method for real-time 3D facial performance capture with
consumer-level RGB-D sensors. Our capturing system is targeted at robust and
stable 3D face capturing in the wild, in which the RGB-D facial data contain
noise, imperfection and occlusion, and often exhibit high variability in
motion, pose, expression and lighting conditions, thus posing great challenges.
The technical contribution is a self-supervised deep learning framework, which
is trained directly from raw RGB-D data. The key novelties include: (1)
learning both the core tensor and the parameters for refining our parametric
face model; (2) using vertex displacement and UV map for learning surface
detail; (3) designing the loss function by incorporating temporal coherence and
same identity constraints based on pairs of RGB-D images and utilizing sparse
norms, in addition to the conventional terms for photo-consistency, feature
similarity, regularization as well as geometry consistency; and (4) augmenting
the training data set in new ways. The method is demonstrated in a live setup
that runs in real-time on a smartphone and an RGB-D sensor. Extensive
experiments show that our method is robust to severe occlusion, fast motion,
large rotation, exaggerated facial expressions and diverse lighting
MobileFace: 3D Face Reconstruction with Efficient CNN Regression
Estimation of facial shapes plays a central role for face transfer and
animation. Accurate 3D face reconstruction, however, often deploys iterative
and costly methods preventing real-time applications. In this work we design a
compact and fast CNN model enabling real-time face reconstruction on mobile
devices. For this purpose, we first study more traditional but slow morphable
face models and use them to automatically annotate a large set of images for
CNN training. We then investigate a class of efficient MobileNet CNNs and adapt
such models for the task of shape regression. Our evaluation on three datasets
demonstrates significant improvements in the speed and the size of our model
while maintaining state-of-the-art reconstruction accuracy.Comment: ECCV Workshops (PeopleCap) 201
FML: Face Model Learning from Videos
Monocular image-based 3D reconstruction of faces is a long-standing problem
in computer vision. Since image data is a 2D projection of a 3D face, the
resulting depth ambiguity makes the problem ill-posed. Most existing methods
rely on data-driven priors that are built from limited 3D face scans. In
contrast, we propose multi-frame video-based self-supervised training of a deep
network that (i) learns a face identity model both in shape and appearance
while (ii) jointly learning to reconstruct 3D faces. Our face model is learned
using only corpora of in-the-wild video clips collected from the Internet. This
virtually endless source of training data enables learning of a highly general
3D face model. In order to achieve this, we propose a novel multi-frame
consistency loss that ensures consistent shape and appearance across multiple
frames of a subject's face, thus minimizing depth ambiguity. At test time we
can use an arbitrary number of frames, so that we can perform both monocular as
well as multi-frame reconstruction.Comment: CVPR 2019 (Oral). Video: https://www.youtube.com/watch?v=SG2BwxCw0lQ,
Project Page: https://gvv.mpi-inf.mpg.de/projects/FML19
Single Image 3D Hand Reconstruction with Mesh Convolutions
Monocular 3D reconstruction of deformable objects, such as human body parts,
has been typically approached by predicting parameters of heavyweight linear
models. In this paper, we demonstrate an alternative solution that is based on
the idea of encoding images into a latent non-linear representation of meshes.
The prior on 3D hand shapes is learned by training an autoencoder with
intrinsic graph convolutions performed in the spectral domain. The pre-trained
decoder acts as a non-linear statistical deformable model. The latent
parameters that reconstruct the shape and articulated pose of hands in the
image are predicted using an image encoder. We show that our system
reconstructs plausible meshes and operates in real-time. We evaluate the
quality of the mesh reconstructions produced by the decoder on a new dataset
and show latent space interpolation results. Our code, data, and models will be
made publicly available.Comment: Proceedings of the British Machine Vision Conference (BMVC 2019
Dense Face Alignment
Face alignment is a classic problem in the computer vision field. Previous
works mostly focus on sparse alignment with a limited number of facial landmark
points, i.e., facial landmark detection. In this paper, for the first time, we
aim at providing a very dense 3D alignment for large-pose face images. To
achieve this, we train a CNN to estimate the 3D face shape, which not only
aligns limited facial landmarks but also fits face contours and SIFT feature
points. Moreover, we also address the bottleneck of training CNN with multiple
datasets, due to different landmark markups on different datasets, such as 5,
34, 68. Experimental results show our method not only provides high-quality,
dense 3D face fitting but also outperforms the state-of-the-art facial landmark
detection methods on the challenging datasets. Our model can run at real time
during testing.Comment: To appear in ICCV 2017 Worksho
Globally Tuned Cascade Pose Regression via Back Propagation with Application in 2D Face Pose Estimation and Heart Segmentation in 3D CT Images
Recently, a successful pose estimation algorithm, called Cascade Pose
Regression (CPR), was proposed in the literature. Trained over Pose Index
Feature, CPR is a regressor ensemble that is similar to Boosting. In this paper
we show how CPR can be represented as a Neural Network. Specifically, we adopt
a Graph Transformer Network (GTN) representation and accordingly train CPR with
Back Propagation (BP) that permits globally tuning. In contrast, previous CPR
literature only took a layer wise training without any post fine tuning. We
empirically show that global training with BP outperforms layer-wise
(pre-)training. Our CPR-GTN adopts a Multi Layer Percetron as the regressor,
which utilized sparse connection to learn local image feature representation.
We tested the proposed CPR-GTN on 2D face pose estimation problem as in
previous CPR literature. Besides, we also investigated the possibility of
extending CPR-GTN to 3D pose estimation by doing experiments using 3D Computed
Tomography dataset for heart segmentation
OmniDepth: Dense Depth Estimation for Indoors Spherical Panoramas
Recent work on depth estimation up to now has only focused on projective
images ignoring 360 content which is now increasingly and more easily produced.
We show that monocular depth estimation models trained on traditional images
produce sub-optimal results on omnidirectional images, showcasing the need for
training directly on 360 datasets, which however, are hard to acquire. In this
work, we circumvent the challenges associated with acquiring high quality 360
datasets with ground truth depth annotations, by re-using recently released
large scale 3D datasets and re-purposing them to 360 via rendering. This
dataset, which is considerably larger than similar projective datasets, is
publicly offered to the community to enable future research in this direction.
We use this dataset to learn in an end-to-end fashion the task of depth
estimation from 360 images. We show promising results in our synthesized data
as well as in unseen realistic images.Comment: Pre-print to appear in ECCV1
- …