1,621 research outputs found
Automatic Analysis of Facial Expressions Based on Deep Covariance Trajectories
In this paper, we propose a new approach for facial expression recognition
using deep covariance descriptors. The solution is based on the idea of
encoding local and global Deep Convolutional Neural Network (DCNN) features
extracted from still images, in compact local and global covariance
descriptors. The space geometry of the covariance matrices is that of Symmetric
Positive Definite (SPD) matrices. By conducting the classification of static
facial expressions using Support Vector Machine (SVM) with a valid Gaussian
kernel on the SPD manifold, we show that deep covariance descriptors are more
effective than the standard classification with fully connected layers and
softmax. Besides, we propose a completely new and original solution to model
the temporal dynamic of facial expressions as deep trajectories on the SPD
manifold. As an extension of the classification pipeline of covariance
descriptors, we apply SVM with valid positive definite kernels derived from
global alignment for deep covariance trajectories classification. By performing
extensive experiments on the Oulu-CASIA, CK+, and SFEW datasets, we show that
both the proposed static and dynamic approaches achieve state-of-the-art
performance for facial expression recognition outperforming many recent
approaches.Comment: A preliminary version of this work appeared in "Otberdout N, Kacem A,
Daoudi M, Ballihi L, Berretti S. Deep Covariance Descriptors for Facial
Expression Recognition, in British Machine Vision Conference 2018, BMVC 2018,
Northumbria University, Newcastle, UK, September 3-6, 2018. ; 2018 :159."
arXiv admin note: substantial text overlap with arXiv:1805.0386
Automatic Analysis of Facial Expressions Based on Deep Covariance Trajectories
International audienceIn this paper, we propose a new approach for facial expression recognition using deep covariance descriptors. The solution is based on the idea of encoding local and global Deep Convolutional Neural Network (DCNN) features extracted from still images, in compact local and global covariance descriptors. The space geometry of the covariance matrices is that of Symmetric Positive Definite (SPD) matrices. By conducting the classification of static facial expressions using Support Vector Machine (SVM) with a valid Gaussian kernel on the SPD manifold, we show that deep covariance descriptors are more effective than the standard classification with fully connected layers and softmax. Besides, we propose a completely new and original solution to model the temporal dynamic of facial expressions as deep trajectories on the SPD manifold. As an extension of the classification pipeline of covariance descriptors, we apply SVM with valid positive definite kernels derived from global alignment for deep covariance trajectories classification. By performing extensive experiments on the Oulu-CASIA, CK+, SFEW and AFEW datasets, we show that both the proposed static and dynamic approaches achieve state-of-the-art performance for facial expression recognition outperforming many recent approaches
Deception Detection in Videos
We present a system for covert automated deception detection in real-life
courtroom trial videos. We study the importance of different modalities like
vision, audio and text for this task. On the vision side, our system uses
classifiers trained on low level video features which predict human
micro-expressions. We show that predictions of high-level micro-expressions can
be used as features for deception prediction. Surprisingly, IDT (Improved Dense
Trajectory) features which have been widely used for action recognition, are
also very good at predicting deception in videos. We fuse the score of
classifiers trained on IDT features and high-level micro-expressions to improve
performance. MFCC (Mel-frequency Cepstral Coefficients) features from the audio
domain also provide a significant boost in performance, while information from
transcripts is not very beneficial for our system. Using various classifiers,
our automated system obtains an AUC of 0.877 (10-fold cross-validation) when
evaluated on subjects which were not part of the training set. Even though
state-of-the-art methods use human annotations of micro-expressions for
deception detection, our fully automated approach outperforms them by 5%. When
combined with human annotations of micro-expressions, our AUC improves to
0.922. We also present results of a user-study to analyze how well do average
humans perform on this task, what modalities they use for deception detection
and how they perform if only one modality is accessible. Our project page can
be found at \url{https://doubaibai.github.io/DARE/}.Comment: AAAI 2018, project page: https://doubaibai.github.io/DARE
Speech-driven Animation with Meaningful Behaviors
Conversational agents (CAs) play an important role in human computer
interaction. Creating believable movements for CAs is challenging, since the
movements have to be meaningful and natural, reflecting the coupling between
gestures and speech. Studies in the past have mainly relied on rule-based or
data-driven approaches. Rule-based methods focus on creating meaningful
behaviors conveying the underlying message, but the gestures cannot be easily
synchronized with speech. Data-driven approaches, especially speech-driven
models, can capture the relationship between speech and gestures. However, they
create behaviors disregarding the meaning of the message. This study proposes
to bridge the gap between these two approaches overcoming their limitations.
The approach builds a dynamic Bayesian network (DBN), where a discrete variable
is added to constrain the behaviors on the underlying constraint. The study
implements and evaluates the approach with two constraints: discourse functions
and prototypical behaviors. By constraining on the discourse functions (e.g.,
questions), the model learns the characteristic behaviors associated with a
given discourse class learning the rules from the data. By constraining on
prototypical behaviors (e.g., head nods), the approach can be embedded in a
rule-based system as a behavior realizer creating trajectories that are timely
synchronized with speech. The study proposes a DBN structure and a training
approach that (1) models the cause-effect relationship between the constraint
and the gestures, (2) initializes the state configuration models increasing the
range of the generated behaviors, and (3) captures the differences in the
behaviors across constraints by enforcing sparse transitions between shared and
exclusive states per constraint. Objective and subjective evaluations
demonstrate the benefits of the proposed approach over an unconstrained model.Comment: 13 pages, 12 figures, 5 table
Facial Expression Recognition Based on Deep Learning Convolution Neural Network: A Review
Facial emotional processing is one of the most important activities in effective calculations, engagement with people and computers, machine vision, video game testing, and consumer research. Facial expressions are a form of nonverbal communication, as they reveal a person's inner feelings and emotions. Extensive attention to Facial Expression Recognition (FER) has recently been received as facial expressions are considered. As the fastest communication medium of any kind of information. Facial expression recognition gives a better understanding of a person's thoughts or views and analyzes them with the currently trending deep learning methods. Accuracy rate sharply compared to traditional state-of-the-art systems. This article provides a brief overview of the different FER fields of application and publicly accessible databases used in FER and studies the latest and current reviews in FER using Convolution Neural Network (CNN) algorithms. Finally, it is observed that everyone reached good results, especially in terms of accuracy, with different rates, and using different data sets, which impacts the results
Automatic Detection of Self-Adaptors for Psychological Distress
Psychological distress is a significant and growing
issue in society. Automatic detection, assessment, and analysis
of such distress is an active area of research. Compared to
modalities such as face, head, and vocal, research investigating
the use of the body modality for these tasks is relatively
sparse. This is, in part, due to the lack of available datasets
and difficulty in automatically extracting useful body features.
Recent advances in pose estimation and deep learning have
enabled new approaches to this modality and domain. We
propose a novel method to automatically detect self-adaptors
and fidgeting, a subset of self-adaptors that has been shown
to be correlated with psychological distress. We also propose
a multi-modal approach that combines different feature representations using Multi-modal Deep Denoising Auto-Encoders
and Improved Fisher Vector encoding. We also demonstrate
that our proposed model, combining audio-visual features with
automatically detected fidgeting behavioral cues, can successfully predict distress levels in a dataset labeled with self-reported anxiety and depression levels. To enable this research
we introduce a new dataset containing full body videos for short
interviews and self-reported distress labels.King's College, Cmabridg
Fusion of Physiological and Behavioural Signals on SPD Manifolds with Application to Stress and Pain Detection
Existing multimodal stress/pain recognition approaches generally extract
features from different modalities independently and thus ignore cross-modality
correlations. This paper proposes a novel geometric framework for multimodal
stress/pain detection utilizing Symmetric Positive Definite (SPD) matrices as a
representation that incorporates the correlation relationship of physiological
and behavioural signals from covariance and cross-covariance. Considering the
non-linearity of the Riemannian manifold of SPD matrices, well-known machine
learning techniques are not suited to classify these matrices. Therefore, a
tangent space mapping method is adopted to map the derived SPD matrix sequences
to the vector sequences in the tangent space where the LSTM-based network can
be applied for classification. The proposed framework has been evaluated on two
public multimodal datasets, achieving both the state-of-the-art results for
stress and pain detection tasks.Comment: International Conference on Systems, Man, and Cybernetics, IEEE SMC
2022, October 9-12, 202
Looking at the Body: Automatic Analysis of Body Gestures and Self-Adaptors in Psychological Distress
Psychological distress is a significant and growing issue in society.
Automatic detection, assessment, and analysis of such distress is an active
area of research. Compared to modalities such as face, head, and vocal,
research investigating the use of the body modality for these tasks is
relatively sparse. This is, in part, due to the limited available datasets and
difficulty in automatically extracting useful body features. Recent advances in
pose estimation and deep learning have enabled new approaches to this modality
and domain. To enable this research, we have collected and analyzed a new
dataset containing full body videos for short interviews and self-reported
distress labels. We propose a novel method to automatically detect
self-adaptors and fidgeting, a subset of self-adaptors that has been shown to
be correlated with psychological distress. We perform analysis on statistical
body gestures and fidgeting features to explore how distress levels affect
participants' behaviors. We then propose a multi-modal approach that combines
different feature representations using Multi-modal Deep Denoising
Auto-Encoders and Improved Fisher Vector Encoding. We demonstrate that our
proposed model, combining audio-visual features with automatically detected
fidgeting behavioral cues, can successfully predict distress levels in a
dataset labeled with self-reported anxiety and depression levels
- …