1,063 research outputs found

    Learning Temporal Alignment Uncertainty for Efficient Event Detection

    Full text link
    In this paper we tackle the problem of efficient video event detection. We argue that linear detection functions should be preferred in this regard due to their scalability and efficiency during estimation and evaluation. A popular approach in this regard is to represent a sequence using a bag of words (BOW) representation due to its: (i) fixed dimensionality irrespective of the sequence length, and (ii) its ability to compactly model the statistics in the sequence. A drawback to the BOW representation, however, is the intrinsic destruction of the temporal ordering information. In this paper we propose a new representation that leverages the uncertainty in relative temporal alignments between pairs of sequences while not destroying temporal ordering. Our representation, like BOW, is of a fixed dimensionality making it easily integrated with a linear detection function. Extensive experiments on CK+, 6DMG, and UvA-NEMO databases show significant performance improvements across both isolated and continuous event detection tasks.Comment: Appeared in DICTA 2015, 8 page

    Improving Facial Analysis and Performance Driven Animation through Disentangling Identity and Expression

    Full text link
    We present techniques for improving performance driven facial animation, emotion recognition, and facial key-point or landmark prediction using learned identity invariant representations. Established approaches to these problems can work well if sufficient examples and labels for a particular identity are available and factors of variation are highly controlled. However, labeled examples of facial expressions, emotions and key-points for new individuals are difficult and costly to obtain. In this paper we improve the ability of techniques to generalize to new and unseen individuals by explicitly modeling previously seen variations related to identity and expression. We use a weakly-supervised approach in which identity labels are used to learn the different factors of variation linked to identity separately from factors related to expression. We show how probabilistic modeling of these sources of variation allows one to learn identity-invariant representations for expressions which can then be used to identity-normalize various procedures for facial expression analysis and animation control. We also show how to extend the widely used techniques of active appearance models and constrained local models through replacing the underlying point distribution models which are typically constructed using principal component analysis with identity-expression factorized representations. We present a wide variety of experiments in which we consistently improve performance on emotion recognition, markerless performance-driven facial animation and facial key-point tracking.Comment: to appear in Image and Vision Computing Journal (IMAVIS

    Fully automatic facial action unit detection and temporal analysis

    No full text
    In this work we report on the progress of building a system that enables fully automated fast and robust facial expression recognition from face video. We analyse subtle changes in facial expression by recognizing facial muscle action units (AUs) and analysing their temporal behavior. By detecting AUs from face video we enable the analysis of various facial communicative signals including facial expressions of emotion, attitude and mood. For an input video picturing a facial expression we detect per frame whether any of 15 different AUs is activated, whether that facial action is in the onset, apex, or offset phase, and what the total duration of the activation in question is. We base this process upon a set of spatio-temporal features calculated from tracking data for 20 facial fiducial points. To detect these 20 points of interest in the first frame of an input face video, we utilize a fully automatic, facial point localization method that uses individual feature GentleBoost templates built from Gabor wavelet features. Then, we exploit a particle filtering scheme that uses factorized likelihoods and a novel observation model that combines a rigid and a morphological model to track the facial points. The AUs displayed in the input video and their temporal segments are recognized finally by Support Vector Machines trained on a subset of most informative spatio-temporal features selected by AdaBoost. For Cohn-Kanade and MMI databases, the proposed system classifies 15 AUs occurring alone or in combination with other AUs with a mean agreement rate of 90.2 % with human FACS coders

    Timing is everything: A spatio-temporal approach to the analysis of facial actions

    No full text
    This thesis presents a fully automatic facial expression analysis system based on the Facial Action Coding System (FACS). FACS is the best known and the most commonly used system to describe facial activity in terms of facial muscle actions (i.e., action units, AUs). We will present our research on the analysis of the morphological, spatio-temporal and behavioural aspects of facial expressions. In contrast with most other researchers in the field who use appearance based techniques, we use a geometric feature based approach. We will argue that that approach is more suitable for analysing facial expression temporal dynamics. Our system is capable of explicitly exploring the temporal aspects of facial expressions from an input colour video in terms of their onset (start), apex (peak) and offset (end). The fully automatic system presented here detects 20 facial points in the first frame and tracks them throughout the video. From the tracked points we compute geometry-based features which serve as the input to the remainder of our systems. The AU activation detection system uses GentleBoost feature selection and a Support Vector Machine (SVM) classifier to find which AUs were present in an expression. Temporal dynamics of active AUs are recognised by a hybrid GentleBoost-SVM-Hidden Markov model classifier. The system is capable of analysing 23 out of 27 existing AUs with high accuracy. The main contributions of the work presented in this thesis are the following: we have created a method for fully automatic AU analysis with state-of-the-art recognition results. We have proposed for the first time a method for recognition of the four temporal phases of an AU. We have build the largest comprehensive database of facial expressions to date. We also present for the first time in the literature two studies for automatic distinction between posed and spontaneous expressions

    Biologically vs. logic inspired encoding of facial actions and emotions in video

    No full text
    • …
    corecore