820 research outputs found
Improving Facial Analysis and Performance Driven Animation through Disentangling Identity and Expression
We present techniques for improving performance driven facial animation,
emotion recognition, and facial key-point or landmark prediction using learned
identity invariant representations. Established approaches to these problems
can work well if sufficient examples and labels for a particular identity are
available and factors of variation are highly controlled. However, labeled
examples of facial expressions, emotions and key-points for new individuals are
difficult and costly to obtain. In this paper we improve the ability of
techniques to generalize to new and unseen individuals by explicitly modeling
previously seen variations related to identity and expression. We use a
weakly-supervised approach in which identity labels are used to learn the
different factors of variation linked to identity separately from factors
related to expression. We show how probabilistic modeling of these sources of
variation allows one to learn identity-invariant representations for
expressions which can then be used to identity-normalize various procedures for
facial expression analysis and animation control. We also show how to extend
the widely used techniques of active appearance models and constrained local
models through replacing the underlying point distribution models which are
typically constructed using principal component analysis with
identity-expression factorized representations. We present a wide variety of
experiments in which we consistently improve performance on emotion
recognition, markerless performance-driven facial animation and facial
key-point tracking.Comment: to appear in Image and Vision Computing Journal (IMAVIS
Automatic 3D facial expression recognition using geometric and textured feature fusion
3D facial expression recognition has gained more and more interests from affective computing society due to issues such as pose variations and illumination changes caused by 2D imaging having been eliminated. There are many applications that can benefit from this research, such as medical applications involving the detection of pain and psychological effects in patients, in human-computer interaction tasks that intelligent systems use in today's world. In this paper, we look into 3D Facial Expression Recognition, by investigating many feature extraction methods used on the 2D textured images and 3D geometric data, fusing the 2 domains to increase the overall performance. A One Vs All Multi-class SVM Classifier has been adopted to recognize the expressions Angry, Disgust, Fear, Happy, Neutral, Sad and Surprise from the BU-3DFE and Bosphorus databases. The proposed approach displays an increase in performance when the features are fused together
3d Face Reconstruction And Emotion Analytics With Part-Based Morphable Models
3D face reconstruction and facial expression analytics using 3D facial data are new
and hot research topics in computer graphics and computer vision. In this proposal, we first
review the background knowledge for emotion analytics using 3D morphable face model, including
geometry feature-based methods, statistic model-based methods and more advanced
deep learning-bade methods. Then, we introduce a novel 3D face modeling and reconstruction
solution that robustly and accurately acquires 3D face models from a couple of images
captured by a single smartphone camera. Two selfie photos of a subject taken from the
front and side are used to guide our Non-Negative Matrix Factorization (NMF) induced
part-based face model to iteratively reconstruct an initial 3D face of the subject. Then, an
iterative detail updating method is applied to the initial generated 3D face to reconstruct
facial details through optimizing lighting parameters and local depths. Our iterative 3D
face reconstruction method permits fully automatic registration of a part-based face representation
to the acquired face data and the detailed 2D/3D features to build a high-quality
3D face model. The NMF part-based face representation learned from a 3D face database
facilitates effective global and adaptive local detail data fitting alternatively. Our system
is flexible and it allows users to conduct the capture in any uncontrolled environment. We
demonstrate the capability of our method by allowing users to capture and reconstruct their
3D faces by themselves.
Based on the 3D face model reconstruction, we can analyze the facial expression and
the related emotion in 3D space. We present a novel approach to analyze the facial expressions
from images and a quantitative information visualization scheme for exploring this
type of visual data. From the reconstructed result using NMF part-based morphable 3D face
model, basis parameters and a displacement map are extracted as features for facial emotion
analysis and visualization. Based upon the features, two Support Vector Regressions (SVRs)
are trained to determine the fuzzy Valence-Arousal (VA) values to quantify the emotions.
The continuously changing emotion status can be intuitively analyzed by visualizing the
VA values in VA-space. Our emotion analysis and visualization system, based on 3D NMF
morphable face model, detects expressions robustly from various head poses, face sizes and
lighting conditions, and is fully automatic to compute the VA values from images or a sequence
of video with various facial expressions. To evaluate our novel method, we test our
system on publicly available databases and evaluate the emotion analysis and visualization
results. We also apply our method to quantifying emotion changes during motivational interviews.
These experiments and applications demonstrate effectiveness and accuracy of
our method.
In order to improve the expression recognition accuracy, we present a facial expression
recognition approach with 3D Mesh Convolutional Neural Network (3DMCNN) and a visual
analytics guided 3DMCNN design and optimization scheme. The geometric properties of the
surface is computed using the 3D face model of a subject with facial expressions. Instead of
using regular Convolutional Neural Network (CNN) to learn intensities of the facial images,
we convolve the geometric properties on the surface of the 3D model using 3DMCNN. We
design a geodesic distance-based convolution method to overcome the difficulties raised from
the irregular sampling of the face surface mesh. We further present an interactive visual
analytics for the purpose of designing and modifying the networks to analyze the learned
features and cluster similar nodes in 3DMCNN. By removing low activity nodes in the network,
the performance of the network is greatly improved. We compare our method with the regular CNN-based method by interactively visualizing each layer of the networks and
analyze the effectiveness of our method by studying representative cases. Testing on public
datasets, our method achieves a higher recognition accuracy than traditional image-based
CNN and other 3D CNNs. The presented framework, including 3DMCNN and interactive
visual analytics of the CNN, can be extended to other applications
Multilinear methods for disentangling variations with applications to facial analysis
Several factors contribute to the appearance of an object in a visual scene, including pose,
illumination, and deformation, among others. Each factor accounts for a source of variability
in the data. It is assumed that the multiplicative interactions of these factors emulate the
entangled variability, giving rise to the rich structure of visual object appearance. Disentangling
such unobserved factors from visual data is a challenging task, especially when the data have
been captured in uncontrolled recording conditions (also referred to as “in-the-wild”) and label
information is not available. The work presented in this thesis focuses on disentangling the
variations contained in visual data, in particular applied to 2D and 3D faces. The motivation
behind this work lies in recent developments in the field, such as (i) the creation of large, visual
databases for face analysis, with (ii) the need of extracting information without the use of labels
and (iii) the need to deploy systems under demanding, real-world conditions.
In the first part of this thesis, we present a method to synthesise plausible 3D expressions
that preserve the identity of a target subject. This method is supervised as the model uses
labels, in this case 3D facial meshes of people performing a defined set of facial expressions, to
learn. The ability to synthesise an entire facial rig from a single neutral expression has a large
range of applications both in computer graphics and computer vision, ranging from the ecient
and cost-e↵ective creation of CG characters to scalable data generation for machine learning
purposes. Unlike previous methods based on multilinear models, the proposed approach is
capable to extrapolate well outside the sample pool, which allows it to accurately reproduce
the identity of the target subject and create artefact-free expression shapes while requiring
only a small input dataset. We introduce global-local multilinear models that leverage the
strengths of expression-specific and identity-specific local models combined with coarse motion
estimations from a global model. The expression-specific and identity-specific local models
are built from di↵erent slices of the patch-wise local multilinear model. Experimental results
show that we achieve high-quality, identity-preserving facial expression synthesis results that
outperform existing methods both quantitatively and qualitatively.
In the second part of this thesis, we investigate how the modes of variations from visual data
can be extracted. Our assumption is that visual data has an underlying structure consisting of
factors of variation and their interactions. Finding this structure and the factors is important
as it would not only help us to better understand visual data but once obtained we can edit the factors for use in various applications. Shape from Shading and expression transfer are just two
of the potential applications. To extract the factors of variation, several supervised methods
have been proposed but they require both labels regarding the modes of variations and the same
number of samples under all modes of variations. Therefore, their applicability is limited to
well-organised data, usually captured in well-controlled conditions. We propose a novel general
multilinear matrix decomposition method that discovers the multilinear structure of possibly
incomplete sets of visual data in unsupervised setting. We demonstrate the applicability of the
proposed method in several computer vision tasks, including Shape from Shading (SfS) (in the
wild and with occlusion removal), expression transfer, and estimation of surface normals from
images captured in the wild.
Finally, leveraging the unsupervised multilinear method proposed as well as recent advances in
deep learning, we propose a weakly supervised deep learning method for disentangling multiple
latent factors of variation in face images captured in-the-wild. To this end, we propose a deep
latent variable model, where we model the multiplicative interactions of multiple latent factors
of variation explicitly as a multilinear structure. We demonstrate that the proposed approach
indeed learns disentangled representations of facial expressions and pose, which can be used in
various applications, including face editing, as well as 3D face reconstruction and classification
of facial expression, identity and pose.Open Acces
State of the Art in Face Recognition
Notwithstanding the tremendous effort to solve the face recognition problem, it is not possible yet to design a face recognition system with a potential close to human performance. New computer vision and pattern recognition approaches need to be investigated. Even new knowledge and perspectives from different fields like, psychology and neuroscience must be incorporated into the current field of face recognition to design a robust face recognition system. Indeed, many more efforts are required to end up with a human like face recognition system. This book tries to make an effort to reduce the gap between the previous face recognition research state and the future state
Mean value coordinates–based caricature and expression synthesis
We present a novel method for caricature synthesis based on mean value coordinates (MVC). Our method can be applied to any single frontal face image to learn a specified caricature face pair for frontal and 3D caricature synthesis. This technique only requires one or a small number of exemplar pairs and a natural frontal face image training set, while the system can transfer the style of the exemplar pair across individuals. Further exaggeration can be fulfilled in a controllable way. Our method is further applied to facial expression transfer, interpolation, and exaggeration, which are applications of expression editing. Additionally, we have extended our approach to 3D caricature synthesis based on the 3D version of MVC. With experiments we demonstrate that the transferred expressions are credible and the resulting caricatures can be characterized and recognized
Joint optimization of manifold learning and sparse representations for face and gesture analysis
Face and gesture understanding algorithms are powerful enablers in intelligent vision systems for surveillance, security, entertainment, and smart spaces. In the future, complex networks of sensors and cameras may disperse directions to lost tourists, perform directory lookups in the office lobby, or contact the proper authorities in case of an emergency. To be effective, these systems will need to embrace human subtleties while interacting with people in their natural conditions. Computer vision and machine learning techniques have recently become adept at solving face and gesture tasks using posed datasets in controlled conditions. However, spontaneous human behavior under unconstrained conditions, or in the wild, is more complex and is subject to considerable variability from one person to the next. Uncontrolled conditions such as lighting, resolution, noise, occlusions, pose, and temporal variations complicate the matter further. This thesis advances the field of face and gesture analysis by introducing a new machine learning framework based upon dimensionality reduction and sparse representations that is shown to be robust in posed as well as natural conditions. Dimensionality reduction methods take complex objects, such as facial images, and attempt to learn lower dimensional representations embedded in the higher dimensional data. These alternate feature spaces are computationally more efficient and often more discriminative. The performance of various dimensionality reduction methods on geometric and appearance based facial attributes are studied leading to robust facial pose and expression recognition models. The parsimonious nature of sparse representations (SR) has successfully been exploited for the development of highly accurate classifiers for various applications. Despite the successes of SR techniques, large dictionaries and high dimensional data can make these classifiers computationally demanding. Further, sparse classifiers are subject to the adverse effects of a phenomenon known as coefficient contamination, where for example variations in pose may affect identity and expression recognition. This thesis analyzes the interaction between dimensionality reduction and sparse representations to present a unified sparse representation classification framework that addresses both issues of computational complexity and coefficient contamination. Semi-supervised dimensionality reduction is shown to mitigate the coefficient contamination problems associated with SR classifiers. The combination of semi-supervised dimensionality reduction with SR systems forms the cornerstone for a new face and gesture framework called Manifold based Sparse Representations (MSR). MSR is shown to deliver state-of-the-art facial understanding capabilities. To demonstrate the applicability of MSR to new domains, MSR is expanded to include temporal dynamics. The joint optimization of dimensionality reduction and SRs for classification purposes is a relatively new field. The combination of both concepts into a single objective function produce a relation that is neither convex, nor directly solvable. This thesis studies this problem to introduce a new jointly optimized framework. This framework, termed LGE-KSVD, utilizes variants of Linear extension of Graph Embedding (LGE) along with modified K-SVD dictionary learning to jointly learn the dimensionality reduction matrix, sparse representation dictionary, sparse coefficients, and sparsity-based classifier. By injecting LGE concepts directly into the K-SVD learning procedure, this research removes the support constraints K-SVD imparts on dictionary element discovery. Results are shown for facial recognition, facial expression recognition, human activity analysis, and with the addition of a concept called active difference signatures, delivers robust gesture recognition from Kinect or similar depth cameras
- …