146 research outputs found
Improving Facial Analysis and Performance Driven Animation through Disentangling Identity and Expression
We present techniques for improving performance driven facial animation,
emotion recognition, and facial key-point or landmark prediction using learned
identity invariant representations. Established approaches to these problems
can work well if sufficient examples and labels for a particular identity are
available and factors of variation are highly controlled. However, labeled
examples of facial expressions, emotions and key-points for new individuals are
difficult and costly to obtain. In this paper we improve the ability of
techniques to generalize to new and unseen individuals by explicitly modeling
previously seen variations related to identity and expression. We use a
weakly-supervised approach in which identity labels are used to learn the
different factors of variation linked to identity separately from factors
related to expression. We show how probabilistic modeling of these sources of
variation allows one to learn identity-invariant representations for
expressions which can then be used to identity-normalize various procedures for
facial expression analysis and animation control. We also show how to extend
the widely used techniques of active appearance models and constrained local
models through replacing the underlying point distribution models which are
typically constructed using principal component analysis with
identity-expression factorized representations. We present a wide variety of
experiments in which we consistently improve performance on emotion
recognition, markerless performance-driven facial animation and facial
key-point tracking.Comment: to appear in Image and Vision Computing Journal (IMAVIS
A Comprehensive Performance Evaluation of Deformable Face Tracking "In-the-Wild"
Recently, technologies such as face detection, facial landmark localisation
and face recognition and verification have matured enough to provide effective
and efficient solutions for imagery captured under arbitrary conditions
(referred to as "in-the-wild"). This is partially attributed to the fact that
comprehensive "in-the-wild" benchmarks have been developed for face detection,
landmark localisation and recognition/verification. A very important technology
that has not been thoroughly evaluated yet is deformable face tracking
"in-the-wild". Until now, the performance has mainly been assessed
qualitatively by visually assessing the result of a deformable face tracking
technology on short videos. In this paper, we perform the first, to the best of
our knowledge, thorough evaluation of state-of-the-art deformable face tracking
pipelines using the recently introduced 300VW benchmark. We evaluate many
different architectures focusing mainly on the task of on-line deformable face
tracking. In particular, we compare the following general strategies: (a)
generic face detection plus generic facial landmark localisation, (b) generic
model free tracking plus generic facial landmark localisation, as well as (c)
hybrid approaches using state-of-the-art face detection, model free tracking
and facial landmark localisation technologies. Our evaluation reveals future
avenues for further research on the topic.Comment: E. Antonakos and P. Snape contributed equally and have joint second
authorshi
Recognition of human activities and expressions in video sequences using shape context descriptor
The recognition of objects and classes of objects is of importance in the field of computer vision due to its applicability in areas such as video surveillance, medical imaging and retrieval of images and videos from large databases on the Internet. Effective recognition of object classes is still a challenge in vision; hence, there is much interest to improve the rate of recognition in order to keep up with the rising demands of the fields where these techniques are being applied. This thesis investigates the recognition of activities and expressions in video sequences using a new descriptor called the spatiotemporal shape context. The shape context is a well-known algorithm that describes the shape of an object based upon the mutual distribution of points in the contour of the object; however, it falls short when the distinctive property of an object is not just its shape but also its movement across frames in a video sequence. Since actions and expressions tend to have a motion component that enhances the capability of distinguishing them, the shape based information from the shape context proves insufficient. This thesis proposes new 3D and 4D spatiotemporal shape context descriptors that incorporate into the original shape context changes in motion across frames. Results of classification of actions and expressions demonstrate that the spatiotemporal shape context is better than the original shape context at enhancing recognition of classes in the activity and expression domains
A novel facial action intensity detection system
A dissertation submitted to the Faculty of Science, University of the Witwatersrand, Johannesburg, in fulfilment of requirements for the degree of Master of Science. October 2014.Despite the fact that there has been quite a lot of research done in the eld of facial
expression recognition, not much development has occurred in detecting the intensity
of facial actions. In facial expression recognition, the intensity of facial actions is an
important and crucial aspect, since it would provide more information about the facial
expression of an individual, such as the level of emotion in a face. Furthermore, having
an automated system that can detect the intensity of facial actions in an individual's
face can lead up to a lot of potential applications from lie detection to smart classrooms.
The provided approach includes robust methods for face and facial feature extraction,
and multiple machine learning methods for facial action intensity detection
Photorealistic retrieval of occluded facial information using a performance-driven face model
Facial occlusions can cause both human observers and computer algorithms
to fail in a variety of important tasks such as facial action analysis and
expression classification. This is because the missing information is not
reconstructed accurately enough for the purpose of the task in hand. Most
current computer methods that are used to tackle this problem implement
complex three-dimensional polygonal face models that are generally timeconsuming
to produce and unsuitable for photorealistic reconstruction of
missing facial features and behaviour.
In this thesis, an image-based approach is adopted to solve the occlusion
problem. A dynamic computer model of the face is used to retrieve the
occluded facial information from the driver faces. The model consists of a
set of orthogonal basis actions obtained by application of principal
component analysis (PCA) on image changes and motion fields extracted
from a sequence of natural facial motion (Cowe 2003). Examples of
occlusion affected facial behaviour can then be projected onto the model to
compute coefficients of the basis actions and thus produce photorealistic
performance-driven animations.
Visual inspection shows that the PCA face model recovers aspects of
expressions in those areas occluded in the driver sequence, but the expression is generally muted. To further investigate this finding, a database
of test sequences affected by a considerable set of artificial and natural
occlusions is created. A number of suitable metrics is developed to measure
the accuracy of the reconstructions. Regions of the face that are most
important for performance-driven mimicry and that seem to carry the best
information about global facial configurations are revealed using Bubbles,
thus in effect identifying facial areas that are most sensitive to occlusions.
Recovery of occluded facial information is enhanced by applying an
appropriate scaling factor to the respective coefficients of the basis actions
obtained by PCA. This method improves the reconstruction of the facial
actions emanating from the occluded areas of the face. However, due to the
fact that PCA produces bases that encode composite, correlated actions,
such an enhancement also tends to affect actions in non-occluded areas of
the face. To avoid this, more localised controls for facial actions are
produced using independent component analysis (ICA). Simple projection
of the data onto an ICA model is not viable due to the non-orthogonality of
the extracted bases. Thus occlusion-affected mimicry is first generated using
the PCA model and then enhanced by accordingly manipulating the
independent components that are subsequently extracted from the mimicry.
This combination of methods yields significant improvements and results in
photorealistic reconstructions of occluded facial actions
4DFAB: a large scale 4D facial expression database for biometric applications
The progress we are currently witnessing in many computer vision applications, including automatic face analysis, would not be made possible without tremendous efforts in collecting and annotating large scale visual databases. To this end, we propose 4DFAB, a new large scale database of dynamic high-resolution 3D faces (over 1,800,000 3D meshes). 4DFAB contains recordings of 180 subjects captured in four different sessions spanning over a five-year period. It contains 4D videos of subjects displaying both spontaneous and posed facial behaviours. The database can be used for both face and facial expression recognition, as well as behavioural biometrics. It can also be used to learn very powerful blendshapes for parametrising facial behaviour. In this paper, we conduct several experiments and demonstrate the usefulness of the database for various applications. The database will be made publicly available for research purposes
Statistical modelling for facial expression dynamics
PhDOne of the most powerful and fastest means of relaying emotions between humans are facial expressions.
The ability to capture, understand and mimic those emotions and their underlying dynamics
in the synthetic counterpart is a challenging task because of the complexity of human emotions, different
ways of conveying them, non-linearities caused by facial feature and head motion, and the
ever critical eye of the viewer. This thesis sets out to address some of the limitations of existing
techniques by investigating three components of expression modelling and parameterisation framework:
(1) Feature and expression manifold representation, (2) Pose estimation, and (3) Expression
dynamics modelling and their parameterisation for the purpose of driving a synthetic head avatar.
First, we introduce a hierarchical representation based on the Point Distribution Model (PDM).
Holistic representations imply that non-linearities caused by the motion of facial features, and intrafeature
correlations are implicitly embedded and hence have to be accounted for in the resulting
expression space. Also such representations require large training datasets to account for all possible
variations. To address those shortcomings, and to provide a basis for learning more subtle, localised
variations, our representation consists of tree-like structure where a holistic root component is decomposed
into leaves containing the jaw outline, each of the eye and eyebrows and the mouth. Each
of the hierarchical components is modelled according to its intrinsic functionality, rather than the
final, holistic expression label.
Secondly, we introduce a statistical approach for capturing an underlying low-dimension expression
manifold by utilising components of the previously defined hierarchical representation. As
Principal Component Analysis (PCA) based approaches cannot reliably capture variations caused by
large facial feature changes because of its linear nature, the underlying dynamics manifold for each
of the hierarchical components is modelled using a Hierarchical Latent Variable Model (HLVM) approach.
Whilst retaining PCA properties, such a model introduces a probability density model which
can deal with missing or incomplete data and allows discovery of internal within cluster structures.
All of the model parameters and underlying density model are automatically estimated during the
training stage. We investigate the usefulness of such a model to larger and unseen datasets.
Thirdly, we extend the concept of HLVM model to pose estimation to address the non-linear
shape deformations and definition of the plausible pose space caused by large head motion. Since
our head rarely stays still, and its movements are intrinsically connected with the way we perceive
and understand the expressions, pose information is an integral part of their dynamics. The proposed
3
approach integrates into our existing hierarchical representation model. It is learned using sparse and
discreetly sampled training dataset, and generalises to a larger and continuous view-sphere.
Finally, we introduce a framework that models and extracts expression dynamics. In existing
frameworks, explicit definition of expression intensity and pose information, is often overlooked,
although usually implicitly embedded in the underlying representation. We investigate modelling
of the expression dynamics based on use of static information only, and focus on its sufficiency
for the task at hand. We compare a rule-based method that utilises the existing latent structure and
provides a fusion of different components with holistic and Bayesian Network (BN) approaches. An
Active Appearance Model (AAM) based tracker is used to extract relevant information from input
sequences. Such information is subsequently used to define the parametric structure of the underlying
expression dynamics. We demonstrate that such information can be utilised to animate a synthetic
head avatar.
Submitte
- β¦