338 research outputs found
A dynamic texture based approach to recognition of facial actions and their temporal models
In this work, we propose a dynamic texture-based approach to the recognition of facial Action Units (AUs, atomic facial gestures) and their temporal models (i.e., sequences of temporal segments: neutral, onset, apex, and offset) in near-frontal-view face videos. Two approaches to modeling the dynamics and the appearance in the face region of an input video are compared: an extended version of Motion History Images and a novel method based on Nonrigid Registration using Free-Form Deformations (FFDs). The extracted motion representation is used to derive motion orientation histogram descriptors in both the spatial and temporal domain. Per AU, a combination of discriminative, frame-based GentleBoost ensemble learners and dynamic, generative Hidden Markov Models detects the presence of the AU in question and its temporal segments in an input image sequence. When tested for recognition of all 27 lower and upper face AUs, occurring alone or in combination in 264 sequences from the MMI facial expression database, the proposed method achieved an average event recognition accuracy of 89.2 percent for the MHI method and 94.3 percent for the FFD method. The generalization performance of the FFD method has been tested using the Cohn-Kanade database. Finally, we also explored the performance on spontaneous expressions in the Sensitive Artificial Listener data set
Machine Analysis of Facial Expressions
No abstract
Multi-scale cortical keypoints for realtime hand tracking and gesture recognition
Human-robot interaction is an interdisciplinary
research area which aims at integrating human factors, cognitive
psychology and robot technology. The ultimate goal is
the development of social robots. These robots are expected to
work in human environments, and to understand behavior of
persons through gestures and body movements. In this paper
we present a biological and realtime framework for detecting
and tracking hands. This framework is based on keypoints
extracted from cortical V1 end-stopped cells. Detected keypoints
and the cells’ responses are used to classify the junction type.
By combining annotated keypoints in a hierarchical, multi-scale
tree structure, moving and deformable hands can be segregated,
their movements can be obtained, and they can be tracked over
time. By using hand templates with keypoints at only two scales,
a hand’s gestures can be recognized
Facial Expression Analysis under Partial Occlusion: A Survey
Automatic machine-based Facial Expression Analysis (FEA) has made substantial
progress in the past few decades driven by its importance for applications in
psychology, security, health, entertainment and human computer interaction. The
vast majority of completed FEA studies are based on non-occluded faces
collected in a controlled laboratory environment. Automatic expression
recognition tolerant to partial occlusion remains less understood, particularly
in real-world scenarios. In recent years, efforts investigating techniques to
handle partial occlusion for FEA have seen an increase. The context is right
for a comprehensive perspective of these developments and the state of the art
from this perspective. This survey provides such a comprehensive review of
recent advances in dataset creation, algorithm development, and investigations
of the effects of occlusion critical for robust performance in FEA systems. It
outlines existing challenges in overcoming partial occlusion and discusses
possible opportunities in advancing the technology. To the best of our
knowledge, it is the first FEA survey dedicated to occlusion and aimed at
promoting better informed and benchmarked future work.Comment: Authors pre-print of the article accepted for publication in ACM
Computing Surveys (accepted on 02-Nov-2017
Out-of-plane action unit recognition using recurrent neural networks
A dissertation submitted to the Faculty of Science, University of the Witwatersrand, Johannesburg, in fulfilment of requirements for the degree of Master of Science. Johannesburg, 2015.The face is a fundamental tool to assist in interpersonal communication and interaction between people.
Humans use facial expressions to consciously or subconsciously express their emotional states, such as
anger or surprise. As humans, we are able to easily identify changes in facial expressions even in complicated
scenarios, but the task of facial expression recognition and analysis is complex and challenging
to a computer. The automatic analysis of facial expressions by computers has applications in several scientific
subjects such as psychology, neurology, pain assessment, lie detection, intelligent environments,
psychiatry, and emotion and paralinguistic communication. We look at methods of facial expression
recognition, and in particular, the recognition of Facial Action Coding System’s (FACS) Action Units
(AUs). Movements of individual muscles on the face are encoded by FACS from slightly different, instant
changes in facial appearance. Contractions of specific facial muscles are related to a set of units
called AUs. We make use of Speeded Up Robust Features (SURF) to extract keypoints from the face and
use the SURF descriptors to create feature vectors. SURF provides smaller sized feature vectors than
other commonly used feature extraction techniques. SURF is comparable to or outperforms other methods
with respect to distinctiveness, robustness, and repeatability. It is also much faster than other feature
detectors and descriptors. The SURF descriptor is scale and rotation invariant and is unaffected by small
viewpoint changes or illumination changes. We use the SURF feature vectors to train a recurrent neural
network (RNN) to recognize AUs from the Cohn-Kanade database. An RNN is able to handle temporal
data received from image sequences in which an AU or combination of AUs are shown to develop from
a neutral face. We are recognizing AUs as they provide a more fine-grained means of measurement that
is independent of age, ethnicity, gender and different expression appearance. In addition to recognizing
FACS AUs from the Cohn-Kanade database, we use our trained RNNs to recognize the development
of pain in human subjects. We make use of the UNBC-McMaster pain database which contains image
sequences of people experiencing pain. In some cases, the pain results in their face moving out-of-plane
or some degree of in-plane movement. The temporal processing ability of RNNs can assist in classifying
AUs where the face is occluded and not facing frontally for some part of the sequence. Results are
promising when tested on the Cohn-Kanade database. We see higher overall recognition rates for upper
face AUs than lower face AUs. Since keypoints are globally extracted from the face in our system, local
feature extraction could provide improved recognition results in future work. We also see satisfactory
recognition results when tested on samples with out-of-plane head movement, showing the temporal
processing ability of RNNs
Discriminatively Trained Latent Ordinal Model for Video Classification
We study the problem of video classification for facial analysis and human
action recognition. We propose a novel weakly supervised learning method that
models the video as a sequence of automatically mined, discriminative
sub-events (eg. onset and offset phase for "smile", running and jumping for
"highjump"). The proposed model is inspired by the recent works on Multiple
Instance Learning and latent SVM/HCRF -- it extends such frameworks to model
the ordinal aspect in the videos, approximately. We obtain consistent
improvements over relevant competitive baselines on four challenging and
publicly available video based facial analysis datasets for prediction of
expression, clinical pain and intent in dyadic conversations and on three
challenging human action datasets. We also validate the method with qualitative
results and show that they largely support the intuitions behind the method.Comment: Paper accepted in IEEE TPAMI. arXiv admin note: substantial text
overlap with arXiv:1604.0150
- …