30 research outputs found
Gaussian process domain experts for model adaptation in facial behavior analysis
We present a novel approach for supervised domain adaptation that is based upon the probabilistic framework of Gaussian processes (GPs). Specifically, we introduce domain-specific GPs as local experts for facial expression classification from face images. The adaptation of the classifier is facilitated in probabilistic fashion by conditioning the target expert on multiple source experts. Furthermore, in contrast to existing adaptation approaches, we also learn a target expert from available target data solely. Then, a single and confident classifier is obtained by combining the predictions from multiple experts based on their confidence. Learning of the model is efficient and requires no retraining/reweighting of the source classifiers. We evaluate the proposed approach on two publicly available datasets for multi-class (MultiPIE) and multi-label (DISFA) facial expression classification. To this end, we perform adaptation of two contextual factors: where (view) and who (subject). We show in our experiments that the proposed approach consistently outperforms both source and target classifiers, while using as few as 30 target examples. It also outperforms the state-of-the-art approaches for supervised domain adaptation
Context-sensitive dynamic ordinal regression for intensity estimation of facial action units
Modeling intensity of facial action units from spontaneously displayed facial expressions is challenging mainly because of high variability in subject-specific facial expressiveness, head-movements, illumination changes, etc. These factors make the target problem highly context-sensitive. However, existing methods usually ignore this context-sensitivity of the target problem. We propose a novel Conditional Ordinal Random Field (CORF) model for context-sensitive modeling of the facial action unit intensity, where the W5+ (who, when, what, where, why and how) definition of the context is used. While the proposed model is general enough to handle all six context questions, in this paper we focus on the context questions: who (the observed subject), how (the changes in facial expressions), and when (the timing of facial expressions and their intensity). The context questions who and howare modeled by means of the newly introduced context-dependent covariate effects, and the context question when is modeled in terms of temporal correlation between the ordinal outputs, i.e., intensity levels of action units. We also introduce a weighted softmax-margin learning of CRFs from data with skewed distribution of the intensity levels, which is commonly encountered in spontaneous facial data. The proposed model is evaluated on intensity estimation of pain and facial action units using two recently published datasets (UNBC Shoulder Pain and DISFA) of spontaneously displayed facial expressions. Our experiments show that the proposed model performs significantly better on the target tasks compared to the state-of-the-art approaches. Furthermore, compared to traditional learning of CRFs, we show that the proposed weighted learning results in more robust parameter estimation from the imbalanced intensity data
Copula Ordinal Regression for Joint Estimation of Facial Action Unit Intensity
Joint modeling of the intensity of facial action units (AUs) from face images is challenging due to the large number of AUs (30+) and their intensity levels (6). This is in part due to the lack of suitable models that can efficiently handle such a large number of outputs/classes simultaneously, but also due to the lack of labelled target data. For this reason, majority of the methods proposed so far resort to independent classifiers for the AU intensity. This is suboptimal for at least two reasons: the facial appearance of some AUs changes depending on the intensity of other AUs, and some AUs co-occur more often than others. Encoding this is expected to improve the estimation of target AU intensities, especially in the case of noisy image features, head-pose variations and imbalanced training data. To this end, we introduce a novel modeling framework, Copula Ordinal Regression (COR), that leverages the power of copula functions and CRFs, to detangle the probabilistic modeling of AU dependencies from the marginal modeling of the AU intensity. Consequently, the COR model achieves the joint learning and inference of intensities of multiple AUs, while being computationally tractable. We show on two challenging datasets of naturalistic facial expressions that the proposed approach consistently outperforms (i) independent modeling of AU intensities, and (ii) the state-ofthe-art approach for the target task
Dyadic Speech-based Affect Recognition using DAMI-P2C Parent-child Multimodal Interaction Dataset
Automatic speech-based affect recognition of individuals in dyadic
conversation is a challenging task, in part because of its heavy reliance on
manual pre-processing. Traditional approaches frequently require hand-crafted
speech features and segmentation of speaker turns. In this work, we design
end-to-end deep learning methods to recognize each person's affective
expression in an audio stream with two speakers, automatically discovering
features and time regions relevant to the target speaker's affect. We integrate
a local attention mechanism into the end-to-end architecture and compare the
performance of three attention implementations -- one mean pooling and two
weighted pooling methods. Our results show that the proposed weighted-pooling
attention solutions are able to learn to focus on the regions containing target
speaker's affective information and successfully extract the individual's
valence and arousal intensity. Here we introduce and use a "dyadic affect in
multimodal interaction - parent to child" (DAMI-P2C) dataset collected in a
study of 34 families, where a parent and a child (3-7 years old) engage in
reading storybooks together. In contrast to existing public datasets for affect
recognition, each instance for both speakers in the DAMI-P2C dataset is
annotated for the perceived affect by three labelers. To encourage more
research on the challenging task of multi-speaker affect sensing, we make the
annotated DAMI-P2C dataset publicly available, including acoustic features of
the dyads' raw audios, affect annotations, and a diverse set of developmental,
social, and demographic profiles of each dyad.Comment: Accepted by the 2020 International Conference on Multimodal
Interaction (ICMI'20
Dynamic Facial Landmarking Selection for Emotion Recognition using Gaussian Processes
Facial features are the basis for the emotion
recognition process and are widely used in affective
computing systems. This emotional process is produced
by a dynamic change in the physiological signals
and the visual answers related to the facial expressions.
An important factor in this process, relies
on the shape information of a facial expression, represented
as dynamically changing facial landmarks. In
this paper we present a framework for dynamic facial
landmarking selection based on facial expression analysis
using Gaussian Processes. We perform facial features
tracking, based on Active Appearance Models for
facial landmarking detection, and then use Gaussian
process ranking over the dynamic emotional sequences
with the aim to establish which landmarks are more
relevant for emotional multivariate time-series recognition.
The experimental results show that Gaussian Processes
can effectively fit to an emotional time-series and
the ranking process with log-likelihoods finds the best
landmarks (mouth and eyebrows regions) that represent
a given facial expression sequence. Finally, we use
the best ranked landmarks in emotion recognition tasks
obtaining accurate performances for acted and spontaneous
scenarios of emotional datasets
Joint Facial Action Unit Detection and Feature Fusion: A Multi-conditional Learning Approach
Automated analysis of facial expressions can benefit many domains, from marketing to clinical diagnosis of neurodevelopmental disorders. Facial expressions are typically encoded as a combination of facial muscle activations, i.e., action units. Depending on context, these action units co-occur in specific patterns, and rarely in isolation. Yet, most existing methods for automatic action unit detection fail to exploit dependencies among them, and the corresponding facial features. To address this, we propose a novel multi-conditional latent variable model for simultaneous fusion of facial features and joint action unit detection. Specifically, the proposed model performs feature fusion in a generative fashion via a low-dimensional shared subspace, while simultaneously performing action unit detection using a discriminative classification approach. We show that by combining the merits of both approaches, the proposed methodology outperforms existing purely discriminative/generative methods for the target task. To reduce the number of parameters, and avoid overfitting, a novel Bayesian learning approach based on Monte Carlo sampling is proposed, to integrate out the shared subspace. We validate the proposed method on posed and spontaneous data from three publicly available datasets (CK+, DISFA and Shoulder-pain), and show that both feature fusion and joint learning of action units leads to improved performance compared to the state-of-the-art methods for the task
Neural Conditional Ordinal Random Fields for Agreement Level Estimation
We present a novel approach to automated estimation of agreement intensity levels from facial images. To this end, we employ the MAHNOB Mimicry database of subjects recorded during dyadic interactions, where the facial images are annotated in terms of agreement intensity levels using the Likert scale (strong disagreement, disagreement, neutral, agreement and strong agreement). Dynamic modelling of the agreement levels is accomplished by means of a Conditional Ordinal Random Field model. Specifically, we propose a novel Neural Conditional Ordinal Random Field model that performs non-linear feature extraction from face images using the notion of Neural Networks, while also modelling temporal and ordinal relationships between the agreement levels. We show in our experiments that the proposed approach outperforms existing methods for modelling of sequential data. The preliminary results obtained on five subjects demonstrate that the intensity of agreement can successfully be estimated from facial images (39% F1 score) using the proposed method