3,688 research outputs found
Inferring Facial and Body Language
Machine analysis of human facial and body language is a challenging topic in computer
vision, impacting on important applications such as human-computer interaction and visual
surveillance. In this thesis, we present research building towards computational frameworks
capable of automatically understanding facial expression and behavioural body language.
The thesis work commences with a thorough examination in issues surrounding facial
representation based on Local Binary Patterns (LBP). Extensive experiments with different
machine learning techniques demonstrate that LBP features are efficient and effective for
person-independent facial expression recognition, even in low-resolution settings. We then
present and evaluate a conditional mutual information based algorithm to efficiently learn the
most discriminative LBP features, and show the best recognition performance is obtained by
using SVM classifiers with the selected LBP features. However, the recognition is performed
on static images without exploiting temporal behaviors of facial expression.
Subsequently we present a method to capture and represent temporal dynamics of facial
expression by discovering the underlying low-dimensional manifold. Locality Preserving Projections
(LPP) is exploited to learn the expression manifold in the LBP based appearance
feature space. By deriving a universal discriminant expression subspace using a supervised
LPP, we can effectively align manifolds of different subjects on a generalised expression manifold.
Different linear subspace methods are comprehensively evaluated in expression subspace
learning. We formulate and evaluate a Bayesian framework for dynamic facial expression
recognition employing the derived manifold representation. However, the manifold representation
only addresses temporal correlations of the whole face image, does not consider
spatial-temporal correlations among different facial regions. We then employ Canonical Correlation Analysis (CCA) to capture correlations among face
parts. To overcome the inherent limitations of classical CCA for image data, we introduce
and formalise a novel Matrix-based CCA (MCCA), which can better measure correlations in
2D image data. We show this technique can provide superior performance in regression and
recognition tasks, whilst requiring significantly fewer canonical factors. All the above work
focuses on facial expressions. However, the face is usually perceived not as an isolated object
but as an integrated part of the whole body, and the visual channel combining facial and
bodily expressions is most informative.
Finally we investigate two understudied problems in body language analysis, gait-based
gender discrimination and affective body gesture recognition. To effectively combine face
and body cues, CCA is adopted to establish the relationship between the two modalities, and
derive a semantic joint feature space for the feature-level fusion. Experiments on large data
sets demonstrate that our multimodal systems achieve the superior performance in gender
discrimination and affective state analysis.Research studentship of Queen Mary, the International Travel Grant of the Royal Academy of Engineering,
and the Royal Society International Joint Project
Self-supervised Multi-level Face Model Learning for Monocular Reconstruction at over 250 Hz
The reconstruction of dense 3D models of face geometry and appearance from a
single image is highly challenging and ill-posed. To constrain the problem,
many approaches rely on strong priors, such as parametric face models learned
from limited 3D scan data. However, prior models restrict generalization of the
true diversity in facial geometry, skin reflectance and illumination. To
alleviate this problem, we present the first approach that jointly learns 1) a
regressor for face shape, expression, reflectance and illumination on the basis
of 2) a concurrently learned parametric face model. Our multi-level face model
combines the advantage of 3D Morphable Models for regularization with the
out-of-space generalization of a learned corrective space. We train end-to-end
on in-the-wild images without dense annotations by fusing a convolutional
encoder with a differentiable expert-designed renderer and a self-supervised
training loss, both defined at multiple detail levels. Our approach compares
favorably to the state-of-the-art in terms of reconstruction quality, better
generalizes to real world faces, and runs at over 250 Hz.Comment: CVPR 2018 (Oral). Project webpage:
https://gvv.mpi-inf.mpg.de/projects/FML
- …