Search CORE

Context-sensitive dynamic ordinal regression for intensity estimation of facial action units

Author: Pantic M
Pavlovic V
Rudovic O
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 24/08/2014
Field of study

Modeling intensity of facial action units from spontaneously displayed facial expressions is challenging mainly because of high variability in subject-specific facial expressiveness, head-movements, illumination changes, etc. These factors make the target problem highly context-sensitive. However, existing methods usually ignore this context-sensitivity of the target problem. We propose a novel Conditional Ordinal Random Field (CORF) model for context-sensitive modeling of the facial action unit intensity, where the W5+ (who, when, what, where, why and how) definition of the context is used. While the proposed model is general enough to handle all six context questions, in this paper we focus on the context questions: who (the observed subject), how (the changes in facial expressions), and when (the timing of facial expressions and their intensity). The context questions who and howare modeled by means of the newly introduced context-dependent covariate effects, and the context question when is modeled in terms of temporal correlation between the ordinal outputs, i.e., intensity levels of action units. We also introduce a weighted softmax-margin learning of CRFs from data with skewed distribution of the intensity levels, which is commonly encountered in spontaneous facial data. The proposed model is evaluated on intensity estimation of pain and facial action units using two recently published datasets (UNBC Shoulder Pain and DISFA) of spontaneously displayed facial expressions. Our experiments show that the proposed model performs significantly better on the target tasks compared to the state-of-the-art approaches. Furthermore, compared to traditional learning of CRFs, we show that the proposed weighted learning results in more robust parameter estimation from the imbalanced intensity data

Copula Ordinal Regression for Joint Estimation of Facial Action Unit Intensity

Author: Pantic M
Pavlovic V
Rudovic O
Walecki R
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 29/02/2016
Field of study

Joint modeling of the intensity of facial action units (AUs) from face images is challenging due to the large number of AUs (30+) and their intensity levels (6). This is in part due to the lack of suitable models that can efficiently handle such a large number of outputs/classes simultaneously, but also due to the lack of labelled target data. For this reason, majority of the methods proposed so far resort to independent classifiers for the AU intensity. This is suboptimal for at least two reasons: the facial appearance of some AUs changes depending on the intensity of other AUs, and some AUs co-occur more often than others. Encoding this is expected to improve the estimation of target AU intensities, especially in the case of noisy image features, head-pose variations and imbalanced training data. To this end, we introduce a novel modeling framework, Copula Ordinal Regression (COR), that leverages the power of copula functions and CRFs, to detangle the probabilistic modeling of AU dependencies from the marginal modeling of the AU intensity. Consequently, the COR model achieves the joint learning and inference of intensities of multiple AUs, while being computationally tractable. We show on two challenging datasets of naturalistic facial expressions that the proposed approach consistently outperforms (i) independent modeling of AU intensities, and (ii) the state-ofthe-art approach for the target task

Goldsmiths Research Online

Machine Learning Methods for Social Signal Processing

Author: Nicolaou Mihalis
Pavlovic V.
Rudovic O.
Publication venue: 'Cambridge University Press (CUP)'
Publication date: 01/01/2017
Field of study

arXiv.org e-Print Archive

Dyadic Speech-based Affect Recognition using DAMI-P2C Parent-child Multimodal Interaction Dataset

Author: Chung Junyoung
Fujita Yusuke
Gordon Goren
Hoover-Dempsey Kathleen V
Ioffe Sergey
Kamphaus Randy W
McNab Katrina
Neuman Susan B
Park Hae Won
Rudovic O.
Sainath Tara N
Spaulding Samuel
Stewart Angela
Trigeorgis George
Zhang Zixing
Zhao Huijuan
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 20/08/2020
Field of study

Automatic speech-based affect recognition of individuals in dyadic conversation is a challenging task, in part because of its heavy reliance on manual pre-processing. Traditional approaches frequently require hand-crafted speech features and segmentation of speaker turns. In this work, we design end-to-end deep learning methods to recognize each person's affective expression in an audio stream with two speakers, automatically discovering features and time regions relevant to the target speaker's affect. We integrate a local attention mechanism into the end-to-end architecture and compare the performance of three attention implementations -- one mean pooling and two weighted pooling methods. Our results show that the proposed weighted-pooling attention solutions are able to learn to focus on the regions containing target speaker's affective information and successfully extract the individual's valence and arousal intensity. Here we introduce and use a "dyadic affect in multimodal interaction - parent to child" (DAMI-P2C) dataset collected in a study of 34 families, where a parent and a child (3-7 years old) engage in reading storybooks together. In contrast to existing public datasets for affect recognition, each instance for both speakers in the DAMI-P2C dataset is annotated for the perceived affect by three labelers. To encourage more research on the challenging task of multi-speaker affect sensing, we make the annotated DAMI-P2C dataset publicly available, including acoustic features of the dyads' raw audios, affect annotations, and a diverse set of developmental, social, and demographic profiles of each dyad.Comment: Accepted by the 2020 International Conference on Multimodal Interaction (ICMI'20

Mapping Forests: A Comprehensive Approach for Nonlinear Mapping Problems

Author: A Criminisi
C Dong
CC Chang
E Sanchez-Lozano
F Åström
H Han
Horst Bischof
KA Raftopoulos
Lap-Fai Yu
Mahdi Jampour
Mohammad-Shahram Moin
O Rudovic
P Ekman
P Kontschieder
R Gross
S Moore
U Tariq
W Zheng
WZ Shao
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

TUGraz OPEN Library

Dynamic Facial Landmarking Selection for Emotion Recognition using Gaussian Processes

Author: A Chakraborty
AA Kalaitzis
CE Rasmussen
CH Wu
DJC MacKay
Hernán F. García
I Kotsia
I Matthews
K Bousmalis
K Zhao
L Zhong
M Alvarez
M Hassaballah
M Nicolaou
M Pantic
M Valstar
Mauricio A. Álvarez
O Rudovic
P Chiranjeevi
P Ekman
P Ekman
P Ekman
PDW Kirk
Q Liu
S Taheri
T Pun
Y Cheon
Y Wang
Z Zeng
Álvaro A. Orozco
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 22/11/2017
Field of study

Facial features are the basis for the emotion recognition process and are widely used in affective computing systems. This emotional process is produced by a dynamic change in the physiological signals and the visual answers related to the facial expressions. An important factor in this process, relies on the shape information of a facial expression, represented as dynamically changing facial landmarks. In this paper we present a framework for dynamic facial landmarking selection based on facial expression analysis using Gaussian Processes. We perform facial features tracking, based on Active Appearance Models for facial landmarking detection, and then use Gaussian process ranking over the dynamic emotional sequences with the aim to establish which landmarks are more relevant for emotional multivariate time-series recognition. The experimental results show that Gaussian Processes can effectively fit to an emotional time-series and the ranking process with log-likelihoods finds the best landmarks (mouth and eyebrows regions) that represent a given facial expression sequence. Finally, we use the best ranked landmarks in emotion recognition tasks obtaining accurate performances for acted and spontaneous scenarios of emotional datasets

White Rose Research Online

Joint Facial Action Unit Detection and Feature Fusion: A Multi-conditional Learning Approach

Author: Eleftheriadis S
Pantic M
Rudovic O
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 26/08/2015
Field of study

Automated analysis of facial expressions can benefit many domains, from marketing to clinical diagnosis of neurodevelopmental disorders. Facial expressions are typically encoded as a combination of facial muscle activations, i.e., action units. Depending on context, these action units co-occur in specific patterns, and rarely in isolation. Yet, most existing methods for automatic action unit detection fail to exploit dependencies among them, and the corresponding facial features. To address this, we propose a novel multi-conditional latent variable model for simultaneous fusion of facial features and joint action unit detection. Specifically, the proposed model performs feature fusion in a generative fashion via a low-dimensional shared subspace, while simultaneously performing action unit detection using a discriminative classification approach. We show that by combining the merits of both approaches, the proposed methodology outperforms existing purely discriminative/generative methods for the target task. To reduce the number of parameters, and avoid overfitting, a novel Bayesian learning approach based on Monte Carlo sampling is proposed, to integrate out the shared subspace. We validate the proposed method on posed and spontaneous data from three publicly available datasets (CK+, DISFA and Shoulder-pain), and show that both feature fusion and joint learning of action units leads to improved performance compared to the state-of-the-art methods for the task