1,621 research outputs found

    Automatic Analysis of Facial Expressions Based on Deep Covariance Trajectories

    Get PDF
    In this paper, we propose a new approach for facial expression recognition using deep covariance descriptors. The solution is based on the idea of encoding local and global Deep Convolutional Neural Network (DCNN) features extracted from still images, in compact local and global covariance descriptors. The space geometry of the covariance matrices is that of Symmetric Positive Definite (SPD) matrices. By conducting the classification of static facial expressions using Support Vector Machine (SVM) with a valid Gaussian kernel on the SPD manifold, we show that deep covariance descriptors are more effective than the standard classification with fully connected layers and softmax. Besides, we propose a completely new and original solution to model the temporal dynamic of facial expressions as deep trajectories on the SPD manifold. As an extension of the classification pipeline of covariance descriptors, we apply SVM with valid positive definite kernels derived from global alignment for deep covariance trajectories classification. By performing extensive experiments on the Oulu-CASIA, CK+, and SFEW datasets, we show that both the proposed static and dynamic approaches achieve state-of-the-art performance for facial expression recognition outperforming many recent approaches.Comment: A preliminary version of this work appeared in "Otberdout N, Kacem A, Daoudi M, Ballihi L, Berretti S. Deep Covariance Descriptors for Facial Expression Recognition, in British Machine Vision Conference 2018, BMVC 2018, Northumbria University, Newcastle, UK, September 3-6, 2018. ; 2018 :159." arXiv admin note: substantial text overlap with arXiv:1805.0386

    Automatic Analysis of Facial Expressions Based on Deep Covariance Trajectories

    Get PDF
    International audienceIn this paper, we propose a new approach for facial expression recognition using deep covariance descriptors. The solution is based on the idea of encoding local and global Deep Convolutional Neural Network (DCNN) features extracted from still images, in compact local and global covariance descriptors. The space geometry of the covariance matrices is that of Symmetric Positive Definite (SPD) matrices. By conducting the classification of static facial expressions using Support Vector Machine (SVM) with a valid Gaussian kernel on the SPD manifold, we show that deep covariance descriptors are more effective than the standard classification with fully connected layers and softmax. Besides, we propose a completely new and original solution to model the temporal dynamic of facial expressions as deep trajectories on the SPD manifold. As an extension of the classification pipeline of covariance descriptors, we apply SVM with valid positive definite kernels derived from global alignment for deep covariance trajectories classification. By performing extensive experiments on the Oulu-CASIA, CK+, SFEW and AFEW datasets, we show that both the proposed static and dynamic approaches achieve state-of-the-art performance for facial expression recognition outperforming many recent approaches

    Deception Detection in Videos

    Full text link
    We present a system for covert automated deception detection in real-life courtroom trial videos. We study the importance of different modalities like vision, audio and text for this task. On the vision side, our system uses classifiers trained on low level video features which predict human micro-expressions. We show that predictions of high-level micro-expressions can be used as features for deception prediction. Surprisingly, IDT (Improved Dense Trajectory) features which have been widely used for action recognition, are also very good at predicting deception in videos. We fuse the score of classifiers trained on IDT features and high-level micro-expressions to improve performance. MFCC (Mel-frequency Cepstral Coefficients) features from the audio domain also provide a significant boost in performance, while information from transcripts is not very beneficial for our system. Using various classifiers, our automated system obtains an AUC of 0.877 (10-fold cross-validation) when evaluated on subjects which were not part of the training set. Even though state-of-the-art methods use human annotations of micro-expressions for deception detection, our fully automated approach outperforms them by 5%. When combined with human annotations of micro-expressions, our AUC improves to 0.922. We also present results of a user-study to analyze how well do average humans perform on this task, what modalities they use for deception detection and how they perform if only one modality is accessible. Our project page can be found at \url{https://doubaibai.github.io/DARE/}.Comment: AAAI 2018, project page: https://doubaibai.github.io/DARE

    Speech-driven Animation with Meaningful Behaviors

    Full text link
    Conversational agents (CAs) play an important role in human computer interaction. Creating believable movements for CAs is challenging, since the movements have to be meaningful and natural, reflecting the coupling between gestures and speech. Studies in the past have mainly relied on rule-based or data-driven approaches. Rule-based methods focus on creating meaningful behaviors conveying the underlying message, but the gestures cannot be easily synchronized with speech. Data-driven approaches, especially speech-driven models, can capture the relationship between speech and gestures. However, they create behaviors disregarding the meaning of the message. This study proposes to bridge the gap between these two approaches overcoming their limitations. The approach builds a dynamic Bayesian network (DBN), where a discrete variable is added to constrain the behaviors on the underlying constraint. The study implements and evaluates the approach with two constraints: discourse functions and prototypical behaviors. By constraining on the discourse functions (e.g., questions), the model learns the characteristic behaviors associated with a given discourse class learning the rules from the data. By constraining on prototypical behaviors (e.g., head nods), the approach can be embedded in a rule-based system as a behavior realizer creating trajectories that are timely synchronized with speech. The study proposes a DBN structure and a training approach that (1) models the cause-effect relationship between the constraint and the gestures, (2) initializes the state configuration models increasing the range of the generated behaviors, and (3) captures the differences in the behaviors across constraints by enforcing sparse transitions between shared and exclusive states per constraint. Objective and subjective evaluations demonstrate the benefits of the proposed approach over an unconstrained model.Comment: 13 pages, 12 figures, 5 table

    Facial Expression Recognition Based on Deep Learning Convolution Neural Network: A Review

    Get PDF
    Facial emotional processing is one of the most important activities in effective calculations, engagement with people and computers, machine vision, video game testing, and consumer research. Facial expressions are a form of nonverbal communication, as they reveal a person's inner feelings and emotions. Extensive attention to Facial Expression Recognition (FER) has recently been received as facial expressions are considered. As the fastest communication medium of any kind of information. Facial expression recognition gives a better understanding of a person's thoughts or views and analyzes them with the currently trending deep learning methods. Accuracy rate sharply compared to traditional state-of-the-art systems. This article provides a brief overview of the different FER fields of application and publicly accessible databases used in FER and studies the latest and current reviews in FER using Convolution Neural Network (CNN) algorithms. Finally, it is observed that everyone reached good results, especially in terms of accuracy, with different rates, and using different data sets, which impacts the results

    Automatic Detection of Self-Adaptors for Psychological Distress

    Get PDF
    Psychological distress is a significant and growing issue in society. Automatic detection, assessment, and analysis of such distress is an active area of research. Compared to modalities such as face, head, and vocal, research investigating the use of the body modality for these tasks is relatively sparse. This is, in part, due to the lack of available datasets and difficulty in automatically extracting useful body features. Recent advances in pose estimation and deep learning have enabled new approaches to this modality and domain. We propose a novel method to automatically detect self-adaptors and fidgeting, a subset of self-adaptors that has been shown to be correlated with psychological distress. We also propose a multi-modal approach that combines different feature representations using Multi-modal Deep Denoising Auto-Encoders and Improved Fisher Vector encoding. We also demonstrate that our proposed model, combining audio-visual features with automatically detected fidgeting behavioral cues, can successfully predict distress levels in a dataset labeled with self-reported anxiety and depression levels. To enable this research we introduce a new dataset containing full body videos for short interviews and self-reported distress labels.King's College, Cmabridg

    Fusion of Physiological and Behavioural Signals on SPD Manifolds with Application to Stress and Pain Detection

    Full text link
    Existing multimodal stress/pain recognition approaches generally extract features from different modalities independently and thus ignore cross-modality correlations. This paper proposes a novel geometric framework for multimodal stress/pain detection utilizing Symmetric Positive Definite (SPD) matrices as a representation that incorporates the correlation relationship of physiological and behavioural signals from covariance and cross-covariance. Considering the non-linearity of the Riemannian manifold of SPD matrices, well-known machine learning techniques are not suited to classify these matrices. Therefore, a tangent space mapping method is adopted to map the derived SPD matrix sequences to the vector sequences in the tangent space where the LSTM-based network can be applied for classification. The proposed framework has been evaluated on two public multimodal datasets, achieving both the state-of-the-art results for stress and pain detection tasks.Comment: International Conference on Systems, Man, and Cybernetics, IEEE SMC 2022, October 9-12, 202

    Looking at the Body: Automatic Analysis of Body Gestures and Self-Adaptors in Psychological Distress

    Get PDF
    Psychological distress is a significant and growing issue in society. Automatic detection, assessment, and analysis of such distress is an active area of research. Compared to modalities such as face, head, and vocal, research investigating the use of the body modality for these tasks is relatively sparse. This is, in part, due to the limited available datasets and difficulty in automatically extracting useful body features. Recent advances in pose estimation and deep learning have enabled new approaches to this modality and domain. To enable this research, we have collected and analyzed a new dataset containing full body videos for short interviews and self-reported distress labels. We propose a novel method to automatically detect self-adaptors and fidgeting, a subset of self-adaptors that has been shown to be correlated with psychological distress. We perform analysis on statistical body gestures and fidgeting features to explore how distress levels affect participants' behaviors. We then propose a multi-modal approach that combines different feature representations using Multi-modal Deep Denoising Auto-Encoders and Improved Fisher Vector Encoding. We demonstrate that our proposed model, combining audio-visual features with automatically detected fidgeting behavioral cues, can successfully predict distress levels in a dataset labeled with self-reported anxiety and depression levels
    corecore