54 research outputs found
Two-stage Convolutional Part Heatmap Regression for the 1st 3D Face Alignment in the Wild (3DFAW) Challenge
This paper describes our submission to the 1st 3D Face Alignment in the Wild
(3DFAW) Challenge. Our method builds upon the idea of convolutional part
heatmap regression [1], extending it for 3D face alignment. Our method
decomposes the problem into two parts: (a) X,Y (2D) estimation and (b) Z
(depth) estimation. At the first stage, our method estimates the X,Y
coordinates of the facial landmarks by producing a set of 2D heatmaps, one for
each landmark, using convolutional part heatmap regression. Then, these
heatmaps, alongside the input RGB image, are used as input to a very deep
subnetwork trained via residual learning for regressing the Z coordinate. Our
method ranked 1st in the 3DFAW Challenge, surpassing the second best result by
more than 22%.Comment: Winner of 3D Face Alignment in the Wild (3DFAW) Challenge, ECCV 201
The 2nd 3D Face Alignment In The Wild Challenge (3DFAW-video): Dense Reconstruction From Video
3D face alignment approaches have strong advantages over 2D with respect to representational power and robustness to illumination and pose. Over the past few years, a number of research groups have made rapid advances in dense 3D alignment from 2D video and obtained impressive results. How these various methods compare is relatively unknown. Previous benchmarks addressed sparse 3D alignment and single image 3D reconstruction. No commonly accepted evaluation protocol exists for dense 3D face reconstruction from video with which to compare them. The 2nd 3D Face Alignment in the Wild from Videos (3DFAW-Video) Challenge extends the previous 3DFAW 2016 competition to the estimation of dense 3D facial structure from video. It presented a new large corpora of profile-to-profile face videos recorded under different imaging conditions and annotated with corresponding high-resolution 3D ground truth meshes. In this paper we outline the evaluation protocol, the data used, and the results. 3DFAW-Video is to be held in conjunction with the 2019 International Conference on Computer Vision, in Seoul, Korea
Evaluation of dense 3D reconstruction from 2D face images in the wild
This paper investigates the evaluation of dense 3D face reconstruction from a single 2D image in the wild. To this end, we organise a competition that provides a new benchmark dataset that contains 2000 2D facial images of 135 subjects as well as their 3D ground truth face scans. In contrast to previous competitions or challenges, the aim of this new benchmark dataset is to evaluate the accuracy of a 3D dense face reconstruction algorithm using real, accurate and high-resolution 3D ground truth face scans. In addition to the dataset, we provide a standard protocol as well as a Python script for the evaluation. Last, we report the results obtained by three state-of-the-art 3D face reconstruction systems on the new benchmark dataset. The competition is organised along with the 2018 13th IEEE Conference on Automatic Face & Gesture Recognition
Multi-view 3D face reconstruction in the wild using siamese networks
In this work, we present a novel learning based approach to reconstruct 3D faces from a single or multiple images. Our method uses a simple yet powerful architecture based on siamese neural networks that helps to extract relevant features from each view while keeping the models small. Instead of minimizing multiple objectives, we propose to simultaneously learn the 3D shape and the individual camera poses by using a single term loss based on the reprojection error, which generalizes from one to multiple views. This allows to globally optimize the whole scene without having to tune any hyperparameters and to achieve low reprojection errors, which are important for further texture generation. Finally, we train our model on a large scale dataset with more than 6,000 facial scans. We report competitive results in 3DFAW 2019 challenge, showing the effectiveness of our method.Peer ReviewedPostprint (author's final draft
Implicit Shape and Appearance Priors for Few-Shot Full Head Reconstruction
Recent advancements in learning techniques that employ coordinate-based
neural representations have yielded remarkable results in multi-view 3D
reconstruction tasks. However, these approaches often require a substantial
number of input views (typically several tens) and computationally intensive
optimization procedures to achieve their effectiveness. In this paper, we
address these limitations specifically for the problem of few-shot full 3D head
reconstruction. We accomplish this by incorporating a probabilistic shape and
appearance prior into coordinate-based representations, enabling faster
convergence and improved generalization when working with only a few input
images (even as low as a single image). During testing, we leverage this prior
to guide the fitting process of a signed distance function using a
differentiable renderer. By incorporating the statistical prior alongside
parallelizable ray tracing and dynamic caching strategies, we achieve an
efficient and accurate approach to few-shot full 3D head reconstruction.
Moreover, we extend the H3DS dataset, which now comprises 60 high-resolution 3D
full head scans and their corresponding posed images and masks, which we use
for evaluation purposes. By leveraging this dataset, we demonstrate the
remarkable capabilities of our approach in achieving state-of-the-art results
in geometry reconstruction while being an order of magnitude faster than
previous approaches
H3D-Net: Few-shot high-fidelity 3D head reconstruction
Recent learning approaches that implicitly represent surface geometry using coordinate-based neural representations have shown impressive results in the problem of multi-view 3D reconstruction. The effectiveness of these techniques is, however, subject to the availability of a large number (several tens) of input views of the scene, and computationally demanding optimizations. In this paper, we tackle these limitations for the specific problem of few-shot full 3D head reconstruction, by endowing coordinate-based representations with a probabilistic shape prior that enables faster convergence and better generalization when using few input images (down to three). First, we learn a shape model of 3D heads from thousands of incomplete raw scans using implicit representations. At test time, we jointly overfit two coordinate-based neural networks to the scene, one modeling the geometry and another estimating the surface radiance, using implicit differentiable rendering. We devise a two-stage optimization strategy in which the learned prior is used to initialize and constrain the geometry during an initial optimization phase. Then, the prior is unfrozen and fine-tuned to the scene. By doing this, we achieve high-fidelity head reconstructions, including hair and shoulders, and with a high level of detail that consistently outperforms both state-of-the-art 3D Morphable Models methods in the few-shot scenario, and non-parametric methods when large sets of views are available.This work has been partially funded by the Spanish government with the projects MoHuCo PID2020-120049RBI00, DeeLight PID2020-117142GB-I00 and Maria de Maeztu Seal of Excellence MDM-2016-0656, and by the Government of Catalonia under 2017 DI 028.Peer ReviewedPostprint (author's final draft
- …