1,298 research outputs found
MoFA: Model-based Deep Convolutional Face Autoencoder for Unsupervised Monocular Reconstruction
In this work we propose a novel model-based deep convolutional autoencoder
that addresses the highly challenging problem of reconstructing a 3D human face
from a single in-the-wild color image. To this end, we combine a convolutional
encoder network with an expert-designed generative model that serves as
decoder. The core innovation is our new differentiable parametric decoder that
encapsulates image formation analytically based on a generative model. Our
decoder takes as input a code vector with exactly defined semantic meaning that
encodes detailed face pose, shape, expression, skin reflectance and scene
illumination. Due to this new way of combining CNN-based with model-based face
reconstruction, the CNN-based encoder learns to extract semantically meaningful
parameters from a single monocular input image. For the first time, a CNN
encoder and an expert-designed generative model can be trained end-to-end in an
unsupervised manner, which renders training on very large (unlabeled) real
world data feasible. The obtained reconstructions compare favorably to current
state-of-the-art approaches in terms of quality and richness of representation.Comment: International Conference on Computer Vision (ICCV) 2017 (Oral), 13
page
Fast Landmark Localization with 3D Component Reconstruction and CNN for Cross-Pose Recognition
Two approaches are proposed for cross-pose face recognition, one is based on
the 3D reconstruction of facial components and the other is based on the deep
Convolutional Neural Network (CNN). Unlike most 3D approaches that consider
holistic faces, the proposed approach considers 3D facial components. It
segments a 2D gallery face into components, reconstructs the 3D surface for
each component, and recognizes a probe face by component features. The
segmentation is based on the landmarks located by a hierarchical algorithm that
combines the Faster R-CNN for face detection and the Reduced Tree Structured
Model for landmark localization. The core part of the CNN-based approach is a
revised VGG network. We study the performances with different settings on the
training set, including the synthesized data from 3D reconstruction, the
real-life data from an in-the-wild database, and both types of data combined.
We investigate the performances of the network when it is employed as a
classifier or designed as a feature extractor. The two recognition approaches
and the fast landmark localization are evaluated in extensive experiments, and
compared to stateof-the-art methods to demonstrate their efficacy.Comment: 14 pages, 12 figures, 4 table
Relightable Neural Human Assets from Multi-view Gradient Illuminations
Human modeling and relighting are two fundamental problems in computer vision
and graphics, where high-quality datasets can largely facilitate related
research. However, most existing human datasets only provide multi-view human
images captured under the same illumination. Although valuable for modeling
tasks, they are not readily used in relighting problems. To promote research in
both fields, in this paper, we present UltraStage, a new 3D human dataset that
contains more than 2,000 high-quality human assets captured under both
multi-view and multi-illumination settings. Specifically, for each example, we
provide 32 surrounding views illuminated with one white light and two gradient
illuminations. In addition to regular multi-view images, gradient illuminations
help recover detailed surface normal and spatially-varying material maps,
enabling various relighting applications. Inspired by recent advances in neural
representation, we further interpret each example into a neural human asset
which allows novel view synthesis under arbitrary lighting conditions. We show
our neural human assets can achieve extremely high capture performance and are
capable of representing fine details such as facial wrinkles and cloth folds.
We also validate UltraStage in single image relighting tasks, training neural
networks with virtual relighted data from neural assets and demonstrating
realistic rendering improvements over prior arts. UltraStage will be publicly
available to the community to stimulate significant future developments in
various human modeling and rendering tasks. The dataset is available at
https://miaoing.github.io/RNHA.Comment: Project page: https://miaoing.github.io/RNH
Differentiable Display Photometric Stereo
Photometric stereo leverages variations in illumination conditions to
reconstruct per-pixel surface normals. The concept of display photometric
stereo, which employs a conventional monitor as an illumination source, has the
potential to overcome limitations often encountered in bulky and
difficult-to-use conventional setups. In this paper, we introduce
Differentiable Display Photometric Stereo (DDPS), a method designed to achieve
high-fidelity normal reconstruction using an off-the-shelf monitor and camera.
DDPS addresses a critical yet often neglected challenge in photometric stereo:
the optimization of display patterns for enhanced normal reconstruction. We
present a differentiable framework that couples basis-illumination image
formation with a photometric-stereo reconstruction method. This facilitates the
learning of display patterns that leads to high-quality normal reconstruction
through automatic differentiation. Addressing the synthetic-real domain gap
inherent in end-to-end optimization, we propose the use of a real-world
photometric-stereo training dataset composed of 3D-printed objects. Moreover,
to reduce the ill-posed nature of photometric stereo, we exploit the linearly
polarized light emitted from the monitor to optically separate diffuse and
specular reflections in the captured images. We demonstrate that DDPS allows
for learning display patterns optimized for a target configuration and is
robust to initialization. We assess DDPS on 3D-printed objects with
ground-truth normals and diverse real-world objects, validating that DDPS
enables effective photometric-stereo reconstruction
Polarized 3D: High-Quality Depth Sensing with Polarization Cues
Coarse depth maps can be enhanced by using the shape information from polarization cues. We propose a framework to combine surface normals from polarization (hereafter polarization normals) with an aligned depth map. Polarization normals have not been used for depth enhancement before. This is because polarization normals suffer from physics-based artifacts, such as azimuthal ambiguity, refractive distortion and fronto-parallel signal degradation. We propose a framework to overcome these key challenges, allowing the benefits of polarization to be used to enhance depth maps. Our results demonstrate improvement with respect to state-of-the-art 3D reconstruction techniques.Charles Stark Draper Laboratory (Doctoral Fellowship)Singapore. Ministry of Education (Academic Research Foundation MOE2013-T2-1-159)Singapore. National Research Foundation (Singapore University of Technology and Design
- …