114 research outputs found

    Relative Facial Action Unit Detection

    Full text link
    This paper presents a subject-independent facial action unit (AU) detection method by introducing the concept of relative AU detection, for scenarios where the neutral face is not provided. We propose a new classification objective function which analyzes the temporal neighborhood of the current frame to decide if the expression recently increased, decreased or showed no change. This approach is a significant change from the conventional absolute method which decides about AU classification using the current frame, without an explicit comparison with its neighboring frames. Our proposed method improves robustness to individual differences such as face scale and shape, age-related wrinkles, and transitions among expressions (e.g., lower intensity of expressions). Our experiments on three publicly available datasets (Extended Cohn-Kanade (CK+), Bosphorus, and DISFA databases) show significant improvement of our approach over conventional absolute techniques. Keywords: facial action coding system (FACS); relative facial action unit detection; temporal information;Comment: Accepted at IEEE Winter Conference on Applications of Computer Vision, Steamboat Springs Colorado, USA, 201

    Learning Temporal Alignment Uncertainty for Efficient Event Detection

    Full text link
    In this paper we tackle the problem of efficient video event detection. We argue that linear detection functions should be preferred in this regard due to their scalability and efficiency during estimation and evaluation. A popular approach in this regard is to represent a sequence using a bag of words (BOW) representation due to its: (i) fixed dimensionality irrespective of the sequence length, and (ii) its ability to compactly model the statistics in the sequence. A drawback to the BOW representation, however, is the intrinsic destruction of the temporal ordering information. In this paper we propose a new representation that leverages the uncertainty in relative temporal alignments between pairs of sequences while not destroying temporal ordering. Our representation, like BOW, is of a fixed dimensionality making it easily integrated with a linear detection function. Extensive experiments on CK+, 6DMG, and UvA-NEMO databases show significant performance improvements across both isolated and continuous event detection tasks.Comment: Appeared in DICTA 2015, 8 page

    Discriminant Multi-Label Manifold Embedding for Facial Action Unit Detection

    Get PDF
    This article describes a system for participation in the Facial Expression Recognition and Analysis (FERA2015) sub-challenge for spontaneous action unit occurrence detection. The problem of AU detection is a multi-label classification problem by its nature, which is a fact overseen by most existing work. The correlation information between AUs has the potential of increasing the detection accuracy.We investigate the multi-label AU detection problem by embedding the data on low dimensional manifolds which preserve multi-label correlation. For this, we apply the multi-label Discriminant Laplacian Embedding (DLE) method as an extension to our base system. The system uses SIFT features around a set of facial landmarks that is enhanced with the use of additional non-salient points around transient facial features. Both the base system and the DLE extension show better performance than the challenge baseline results for the two databases in the challenge, and achieve close to 50% as F1-measure on the testing partition in average (9.9% higher than the baseline, in the best case). The DLE extension proves useful for certain AUs, but also shows the need for more analysis to assess the benefits in general

    Automatic analysis of facial actions: a survey

    Get PDF
    As one of the most comprehensive and objective ways to describe facial expressions, the Facial Action Coding System (FACS) has recently received significant attention. Over the past 30 years, extensive research has been conducted by psychologists and neuroscientists on various aspects of facial expression analysis using FACS. Automating FACS coding would make this research faster and more widely applicable, opening up new avenues to understanding how we communicate through facial expressions. Such an automated process can also potentially increase the reliability, precision and temporal resolution of coding. This paper provides a comprehensive survey of research into machine analysis of facial actions. We systematically review all components of such systems: pre-processing, feature extraction and machine coding of facial actions. In addition, the existing FACS-coded facial expression databases are summarised. Finally, challenges that have to be addressed to make automatic facial action analysis applicable in real-life situations are extensively discussed. There are two underlying motivations for us to write this survey paper: the first is to provide an up-to-date review of the existing literature, and the second is to offer some insights into the future of machine recognition of facial actions: what are the challenges and opportunities that researchers in the field face

    Learning to combine local models for Facial Action Unit detection

    Get PDF
    Abstract-Current approaches to automatic analysis of facial action units (AU) can differ in the way the face appearance is represented. Some works represent the whole face, dividing the bounding box region in a regular grid, and applying a feature descriptor to each subpatch. Alternatively, it is also common to consider local patches around the facial landmarks, and apply appearance descriptors to each of them. Almost invariably, all the features from each of these patches are combined into a single feature vector, which is the input to the learning routine and to inference. This constitutes the socalled feature-level fusion strategy. However, it has recently been suggested that decision-level fusion might provide better results. This strategy trains a different classifier per region, and then combines prediction scores linearly. In this work we extend this idea to model-level fusion, employing Artificial Neural Networks with an equivalent architecture. The resulting method has the advantage of learning the weights of the linear combination in a data-driven manner, and of jointly learning all the regionspecific classifiers as well as the region-fusion weights. We show in an experiment that this architecture improves over two baselines, representing typical feature-level fusion. Furthermore, we compare our method with the previously proposed linear decision-level region-fusion method, on the challenging GEMEP-FERA database, showing superior performance

    Automated and Real Time Subtle Facial Feature Tracker for Automatic Emotion Elicitation

    Get PDF
    This thesis proposed a system for real time detection of facial expressions those are subtle and are exhibited in spontaneous real world settings. The underlying frame work of our system is the open source implementation of Active Appearance Model. Our algorithm operates by grouping the various points provided by AAM into higher level regions constructing and updating a background statistical model of movement in each region, and testing whether current movement in a given region substantially exceeds the expected value of movement in that region (computed from statistical model). Movements that exceed the expected value by some threshold and do not appear to be false alarms due to artifacts (e.g., lighting changes) are considered to be valid changes in facial expressions. These changes are expected to be rough indicators of facial activity that can be complemented by contexual driven predictors of emotion that are derived from spontaneous settings

    Dynamic deep learning for automatic facial expression recognition and its application in diagnosis of ADHD & ASD

    Get PDF
    Neurodevelopmental conditions like Attention Deficit Hyperactivity Disorder (ADHD) and Autism Spectrum Disorder (ASD) impact a significant number of children and adults worldwide. Currently, the means of diagnosing of such conditions is carried out by experts, who employ standard questionnaires and look for certain behavioural markers through manual observation. Such methods are not only subjective, difficult to repeat, and costly but also extremely time consuming. However, with the recent surge of research into automatic facial behaviour analysis and it's varied applications, it could prove to be a potential way of tackling these diagnostic difficulties. Automatic facial expression recognition is one of the core components of this field but it has always been challenging to do it accurately in an unconstrained environment. This thesis presents a dynamic deep learning framework for robust automatic facial expression recognition. It also proposes an approach to apply this method for facial behaviour analysis which can help in the diagnosis of conditions like ADHD and ASD. The proposed facial expression algorithm uses a deep Convolutional Neural Networks (CNN) to learn models of facial Action Units (AU). It attempts to model three main distinguishing features of AUs: shape, appearance and short term dynamics, jointly in a CNN. The appearance is modelled through local image regions relevant to each AU, shape is encoded using binary masks computed from automatically detected facial landmarks and dynamics is encoded by using a short sequence of image as input to CNN. In addition, the method also employs Bi-directional Long Short Memory (BLSTM) recurrent neural networks for modelling long term dynamics. The proposed approach is evaluated on a number of databases showing state-of-the-art performance for both AU detection and intensity estimation tasks. The AU intensities estimated using this approach along with other 3D face tracking data, are used for encoding facial behaviour. The encoded facial behaviour is applied for learning models which can help in detection of ADHD and ASD. This approach was evaluated on the KOMAA database which was specially collected for this purpose. Experimental results show that facial behaviour encoded in this way provide a high discriminative power for classification of people with these conditions. It is shown that the proposed system is a potentially useful, objective and time saving contribution to the clinical diagnosis of ADHD and ASD

    Spatio-temporal framework on facial expression recognition.

    Get PDF
    This thesis presents an investigation into two topics that are important in facial expression recognition: how to employ the dynamic information from facial expression image sequences and how to efficiently extract context and other relevant information of different facial regions. This involves the development of spatio-temporal frameworks for recognising facial expression. The thesis proposed three novel frameworks for recognising facial expression. The first framework uses sparse representation to extract features from patches of a face to improve the recognition performance, where part-based methods which are robust to image alignment are applied. In addition, the use of sparse representation reduces the dimensionality of features, and improves the semantic meaning and represents a face image more efficiently. Since a facial expression involves a dynamic process, and the process contains information that describes a facial expression more effectively, it is important to capture such dynamic information so as to recognise facial expressions over the entire video sequence. Thus, the second framework uses two types of dynamic information to enhance the recognition: a novel spatio-temporal descriptor based on PHOG (pyramid histogram of gradient) to represent changes in facial shape, and dense optical flow to estimate the movement (displacement) of facial landmarks. The framework views an image sequence as a spatio-temporal volume, and uses temporal information to represent the dynamic movement of facial landmarks associated with a facial expression. Specifically, spatial based descriptor representing spatial local shape is extended to spatio-temporal domain to capture the changes in local shape of facial sub-regions in the temporal dimension to give 3D facial component sub-regions of forehead, mouth, eyebrow and nose. The descriptor of optical flow is also employed to extract the information of temporal. The fusion of these two descriptors enhance the dynamic information and achieves better performance than the individual descriptors. The third framework also focuses on analysing the dynamics of facial expression sequences to represent spatial-temporal dynamic information (i.e., velocity). Two types of features are generated: a spatio-temporal shape representation to enhance the local spatial and dynamic information, and a dynamic appearance representation. In addition, an entropy-based method is introduced to provide spatial relationship of different parts of a face by computing the entropy value of different sub-regions of a face

    Spontaneous Facial Behavior Computing in Human Machine Interaction with Applications in Autism Treatment

    Get PDF
    Digital devices and computing machines such as computers, hand-held devices and robots are becoming an important part of our daily life. To have affect-aware intelligent Human-Machine Interaction (HMI) systems, scientists and engineers have aimed to design interfaces which can emulate face-to-face communication. Such HMI systems are capable of detecting and responding upon users\u27 emotions and affective states. One of the main challenges for producing such intelligent system is to design a machine, which can automatically compute spontaneous behaviors of humans in real-life settings. Since humans\u27 facial behaviors contain important non-verbal cues, this dissertation studies facial actions and behaviors in HMI systems. The main two objectives of this dissertation are: 1- capturing, annotating and computing spontaneous facial expressions in a Human-Computer Interaction (HCI) system and releasing a database that allows researchers to study the dynamics of facial muscle movements in both posed and spontaneous data. 2- developing and deploying a robot-based intervention protocol for autism therapeutic applications and modeling facial behaviors of children with high-functioning autism in a real-world Human-Robot Interaction (HRI) system. Because of the lack of data for analyzing the dynamics of spontaneous facial expressions, my colleagues and I introduced and released a novel database called Denver Intensity of Spontaneous Facial Actions (DISFA) . DISFA describes facial expressions using Facial Action Coding System (FACS) - a gold standard technique which annotates facial muscle movements in terms of a set of defined Action Units (AUs). This dissertation also introduces an automated system for recognizing DISFA\u27s facial expressions and dynamics of AUs in a single image or sequence of facial images. Results illustrate that our automated system is capable of computing AU dynamics with high accuracy (overall reliability ICC = 0.77). In addition, this dissertation investigates and computes the dynamics and temporal patterns of both spontaneous and posed facial actions, which can be used to automatically infer the meaning of facial expressions. Another objective of this dissertation is to analyze and compute facial behaviors (i.e. eye gaze and head orientation) of individuals in real-world HRI system. Due to the fact that children with Autism Spectrum Disorder (ASD) show interest toward technology, we designed and conducted a set of robot-based games to study and foster the socio-behavioral responses of children diagnosed with high-functioning ASD. Computing the gaze direction and head orientation patterns illustrate how individuals with ASD regulate their facial behaviors differently (compared to typically developing children) when interacting with a robot. In addition, studying the behavioral responses of participants during different phases of this study (i.e. baseline, intervention and follow-up) reveals that overall, a robot-based therapy setting can be a viable approach for helping individuals with autism
    • …
    corecore