15 research outputs found
Markov Chain Monte Carlo for Automated Face Image Analysis
We present a novel fully probabilistic method to interpret a single face image with the 3D Morphable Model. The new method is based on Bayesian inference and makes use of unreliable image-based information. Rather than searching a single optimal solution, we infer the posterior distribution of the model parameters given the target image. The method is a stochastic sampling algorithm with a propose-and-verify architecture based on the Metropolis–Hastings algorithm. The stochastic method can robustly integrate unreliable information and therefore does not rely on feed-forward initialization. The integrative concept is based on two ideas, a separation of proposal moves and their verification with the model (Data-Driven Markov Chain Monte Carlo), and filtering with the Metropolis acceptance rule. It does not need gradients and is less prone to local optima than standard fitters. We also introduce a new collective likelihood which models the average difference between the model and the target image rather than individual pixel differences. The average value shows a natural tendency towards a normal distribution, even when the individual pixel-wise difference is not Gaussian. We employ the new fitting method to calculate posterior models of 3D face reconstructions from single real-world images. A direct application of the algorithm with the 3D Morphable Model leads us to a fully automatic face recognition system with competitive performance on the Multi-PIE database without any database adaptation
Informed MCMC with Bayesian Neural Networks for Facial Image Analysis
Computer vision tasks are difficult because of the large variability in the
data that is induced by changes in light, background, partial occlusion as well
as the varying pose, texture, and shape of objects. Generative approaches to
computer vision allow us to overcome this difficulty by explicitly modeling the
physical image formation process. Using generative object models, the analysis
of an observed image is performed via Bayesian inference of the posterior
distribution. This conceptually simple approach tends to fail in practice
because of several difficulties stemming from sampling the posterior
distribution: high-dimensionality and multi-modality of the posterior
distribution as well as expensive simulation of the rendering process. The main
difficulty of sampling approaches in a computer vision context is choosing the
proposal distribution accurately so that maxima of the posterior are explored
early and the algorithm quickly converges to a valid image interpretation. In
this work, we propose to use a Bayesian Neural Network for estimating an image
dependent proposal distribution. Compared to a standard Gaussian random walk
proposal, this accelerates the sampler in finding regions of the posterior with
high value. In this way, we can significantly reduce the number of samples
needed to perform facial image analysis.Comment: Accepted to the Bayesian Deep Learning Workshop at NeurIPS 201
Morphable Face Models - An Open Framework
In this paper, we present a novel open-source pipeline for face registration
based on Gaussian processes as well as an application to face image analysis.
Non-rigid registration of faces is significant for many applications in
computer vision, such as the construction of 3D Morphable face models (3DMMs).
Gaussian Process Morphable Models (GPMMs) unify a variety of non-rigid
deformation models with B-splines and PCA models as examples. GPMM separate
problem specific requirements from the registration algorithm by incorporating
domain-specific adaptions as a prior model. The novelties of this paper are the
following: (i) We present a strategy and modeling technique for face
registration that considers symmetry, multi-scale and spatially-varying
details. The registration is applied to neutral faces and facial expressions.
(ii) We release an open-source software framework for registration and
model-building, demonstrated on the publicly available BU3D-FE database. The
released pipeline also contains an implementation of an Analysis-by-Synthesis
model adaption of 2D face images, tested on the Multi-PIE and LFW database.
This enables the community to reproduce, evaluate and compare the individual
steps of registration to model-building and 3D/2D model fitting. (iii) Along
with the framework release, we publish a new version of the Basel Face Model
(BFM-2017) with an improved age distribution and an additional facial
expression model
A Closest Point Proposal for MCMC-based Probabilistic Surface Registration
We propose to view non-rigid surface registration as a probabilistic
inference problem. Given a target surface, we estimate the posterior
distribution of surface registrations. We demonstrate how the posterior
distribution can be used to build shape models that generalize better and show
how to visualize the uncertainty in the established correspondence.
Furthermore, in a reconstruction task, we show how to estimate the posterior
distribution of missing data without assuming a fixed point-to-point
correspondence.
We introduce the closest-point proposal for the Metropolis-Hastings
algorithm. Our proposal overcomes the limitation of slow convergence compared
to a random-walk strategy. As the algorithm decouples inference from modeling
the posterior using a propose-and-verify scheme, we show how to choose
different distance measures for the likelihood model.
All presented results are fully reproducible using publicly available data
and our open-source implementation of the registration framework
CNN-based Real-time Dense Face Reconstruction with Inverse-rendered Photo-realistic Face Images
With the powerfulness of convolution neural networks (CNN), CNN based face
reconstruction has recently shown promising performance in reconstructing
detailed face shape from 2D face images. The success of CNN-based methods
relies on a large number of labeled data. The state-of-the-art synthesizes such
data using a coarse morphable face model, which however has difficulty to
generate detailed photo-realistic images of faces (with wrinkles). This paper
presents a novel face data generation method. Specifically, we render a large
number of photo-realistic face images with different attributes based on
inverse rendering. Furthermore, we construct a fine-detailed face image dataset
by transferring different scales of details from one image to another. We also
construct a large number of video-type adjacent frame pairs by simulating the
distribution of real video data. With these nicely constructed datasets, we
propose a coarse-to-fine learning framework consisting of three convolutional
networks. The networks are trained for real-time detailed 3D face
reconstruction from monocular video as well as from a single image. Extensive
experimental results demonstrate that our framework can produce high-quality
reconstruction but with much less computation time compared to the
state-of-the-art. Moreover, our method is robust to pose, expression and
lighting due to the diversity of data.Comment: Accepted by IEEE Transactions on Pattern Analysis and Machine
Intelligence, 201
Photo-Realistic Facial Details Synthesis from Single Image
We present a single-image 3D face synthesis technique that can handle
challenging facial expressions while recovering fine geometric details. Our
technique employs expression analysis for proxy face geometry generation and
combines supervised and unsupervised learning for facial detail synthesis. On
proxy generation, we conduct emotion prediction to determine a new
expression-informed proxy. On detail synthesis, we present a Deep Facial Detail
Net (DFDN) based on Conditional Generative Adversarial Net (CGAN) that employs
both geometry and appearance loss functions. For geometry, we capture 366
high-quality 3D scans from 122 different subjects under 3 facial expressions.
For appearance, we use additional 20K in-the-wild face images and apply
image-based rendering to accommodate lighting variations. Comprehensive
experiments demonstrate that our framework can produce high-quality 3D faces
with realistic details under challenging facial expressions
Reconstruction of three-dimensional facial geometric features related to fetal alcohol syndrome using adult surrogates
Fetal alcohol syndrome (FAS) is a condition caused by prenatal alcohol exposure. The diagnosis of FAS is based on the presence of central nervous system impairments, evidence of growth abnormalities and abnormal facial features. Direct anthropometry has traditionally been used to obtain facial data to assess the FAS facial features. Research efforts have focused on indirect anthropometry such as 3D surface imaging systems to collect facial data for facial analysis. However, 3D surface imaging systems are costly. As an alternative, approaches for 3D reconstruction from a single 2D image of the face using a 3D morphable model (3DMM) were explored in this research study. The research project was accomplished in several steps. 3D facial data were obtained from the publicly available BU-3DFE database, developed by the State University of New York. The 3D face scans in the training set were landmarked by different observers. The reliability and precision in selecting 3D landmarks were evaluated. The intraclass correlation coefficients for intra- and inter-observer reliability were greater than 0.95. The average intra-observer error was 0.26 mm and the average inter-observer error was 0.89 mm. A rigid registration was performed on the 3D face scans in the training set. Following rigid registration, a dense point-to-point correspondence across a set of aligned face scans was computed using the Gaussian process model fitting approach. A 3DMM of the face was constructed from the fully registered 3D face scans. The constructed 3DMM of the face was evaluated based on generalization, specificity, and compactness. The quantitative evaluations show that the constructed 3DMM achieves reliable results. 3D face reconstructions from single 2D images were estimated based on the 3DMM. The MetropolisHastings algorithm was used to fit the 3DMM features to 2D image features to generate the 3D face reconstruction. Finally, the geometric accuracy of the reconstructed 3D faces was evaluated based on ground-truth 3D face scans. The average root mean square error for the surface-to-surface comparisons between the reconstructed faces and the ground-truth face scans was 2.99 mm. In conclusion, a framework to estimate 3D face reconstructions from single 2D facial images was developed and the reconstruction errors were evaluated. The geometric accuracy of the 3D face reconstructions was comparable to that found in the literature. However, future work should consider minimizing reconstruction errors to acceptable clinical standards in order for the framework to be useful for 3D-from-2D reconstruction in general, and also for developing FAS applications. Finally, future work should consider estimating a 3D face using multi-view 2D images to increase the information available for 3D-from-2D reconstruction
Evaluating 3D human face reconstruction from a frontal 2D image, focusing on facial regions associated with foetal alcohol syndrome
Foetal alcohol syndrome (FAS) is a preventable condition caused by maternal alcohol consumption during pregnancy. The FAS facial phenotype is an important factor for diagnosis, alongside central nervous system impairments and growth abnormalities. Current methods for analysing the FAS facial phenotype rely on 3D facial image data, obtained from costly and complex surface scanning devices. An alternative is to use 2D images, which are easy to acquire with a digital camera or smart phone. However, 2D images lack the geometric accuracy required for accurate facial shape analysis. Our research offers a solution through the reconstruction of 3D human faces from single or multiple 2D images. We have developed a framework for evaluating 3D human face reconstruction from a single-input 2D image using a 3D face model for potential use in FAS assessment. We first built a generative morphable model of the face from a database of registered 3D face scans with diverse skin tones. Then we applied this model to reconstruct 3D face surfaces from single frontal images using a model-driven sampling algorithm. The accuracy of the predicted 3D face shapes was evaluated in terms of surface reconstruction error and the accuracy of FAS-relevant landmark locations and distances. Results show an average root mean square error of 2.62 mm. Our framework has the potential to estimate 3D landmark positions for parts of the face associated with the FAS facial phenotype. Future work aims to improve on the accuracy and adapt the approach for use in clinical settings.
Significance:
Our study presents a framework for constructing and evaluating a 3D face model from 2D face scans and evaluating the accuracy of 3D face shape predictions from single images. The results indicate low generalisation error and comparability to other studies. The reconstructions also provide insight into specific regions of the face relevant to FAS diagnosis. The proposed approach presents a potential cost-effective and easily accessible imaging tool for FAS screening, yet its clinical application needs further research
What computational model provides the best explanation of face representations in the primate brain?
Understanding how the brain represents the identity of complex objects is a central challenge of visual neuroscience. The principles governing object processing have been extensively studied in the macaque face patch system, a sub-network of inferotemporal (IT) cortex specialized for face processing (Tsao et al., 2006). A previous study reported that single face patch neurons encode axes of a generative model called the “active appearance” model (Chang and Tsao, 2017), which transforms 50-d feature vectors separately representing facial shape and facial texture into facial images (Cootes et al., 2001; Edwards et al., 1998). However, it remains unclear whether this model constitutes the best model for explaining face cell responses. Here, we recorded responses of cells in the most anterior face patch AM to a large set of real face images, and compared a large number of models for explaining neural responses. We found that the active appearance model better explained responses than any other model except CORnet-Z, a feedforward deep neural network trained on general object classification to classify non-face images, whose performance it tied on some face image sets and exceeded on others. Surprisingly, deep neural networks trained specifically on facial identification did not explain neural responses well. A major reason is that units in the network, unlike neurons, are less modulated by face-related factors unrelated to facial identification such as illumination