29,512 research outputs found
The MuSe 2022 Multimodal Sentiment Analysis Challenge: Humor, Emotional Reactions, and Stress
The Multimodal Sentiment Analysis Challenge (MuSe) 2022 is dedicated to
multimodal sentiment and emotion recognition. For this year's challenge, we
feature three datasets: (i) the Passau Spontaneous Football Coach Humor
(Passau-SFCH) dataset that contains audio-visual recordings of German football
coaches, labelled for the presence of humour; (ii) the Hume-Reaction dataset in
which reactions of individuals to emotional stimuli have been annotated with
respect to seven emotional expression intensities, and (iii) the Ulm-Trier
Social Stress Test (Ulm-TSST) dataset comprising of audio-visual data labelled
with continuous emotion values (arousal and valence) of people in stressful
dispositions. Using the introduced datasets, MuSe 2022 2022 addresses three
contemporary affective computing problems: in the Humor Detection Sub-Challenge
(MuSe-Humor), spontaneous humour has to be recognised; in the Emotional
Reactions Sub-Challenge (MuSe-Reaction), seven fine-grained `in-the-wild'
emotions have to be predicted; and in the Emotional Stress Sub-Challenge
(MuSe-Stress), a continuous prediction of stressed emotion values is featured.
The challenge is designed to attract different research communities,
encouraging a fusion of their disciplines. Mainly, MuSe 2022 targets the
communities of audio-visual emotion recognition, health informatics, and
symbolic sentiment analysis. This baseline paper describes the datasets as well
as the feature sets extracted from them. A recurrent neural network with LSTM
cells is used to set competitive baseline results on the test partitions for
each sub-challenge. We report an Area Under the Curve (AUC) of .8480 for
MuSe-Humor; .2801 mean (from 7-classes) Pearson's Correlations Coefficient for
MuSe-Reaction, as well as .4931 Concordance Correlation Coefficient (CCC) and
.4761 for valence and arousal in MuSe-Stress, respectively.Comment: Preliminary baseline paper for the 3rd Multimodal Sentiment Analysis
Challenge (MuSe) 2022, a full-day workshop at ACM Multimedia 202
Automatic Segmentation of Spontaneous Data using Dimensional Labels from Multiple Coders
This paper focuses on automatic segmentation of spontaneous data using continuous dimensional labels from multiple coders. It introduces efficient algorithms to the aim of (i) producing ground-truth by maximizing inter-coder agreement, (ii) eliciting the frames or samples that capture the transition to and from an emotional state, and (iii) automatic segmentation of spontaneous audio-visual data to be used by machine learning techniques that cannot handle unsegmented sequences. As a proof of concept, the algorithms introduced are tested using data annotated in arousal and valence space. However, they can be straightforwardly applied to data annotated in other continuous emotional spaces, such as power and expectation
Automatic Measurement of Affect in Dimensional and Continuous Spaces: Why, What, and How?
This paper aims to give a brief overview of the current state-of-the-art in automatic measurement of affect signals in dimensional and continuous spaces (a continuous scale from -1 to +1) by seeking answers to the following questions: i) why has the field shifted towards dimensional and continuous interpretations of affective displays recorded in real-world settings? ii) what are the affect dimensions used, and the affect signals measured? and iii) how has the current automatic measurement technology been developed, and how can we advance the field
MULTIMODAL EMOTION ANALYSIS WITH FOCUSED ATTENTION
Emotion analysis, a subset of sentiment analysis, involves the study of a wide array of emotional indicators. In contrast to sentiment analysis, which restricts its focus to positive and negative sentiments, emotion analysis extends beyond these limitations to a diverse spectrum of emotional cues. Contemporary trends in emotion analysis lean toward multimodal approaches that leverage audiovisual and text modalities. However, implementing multimodal strategies introduces its own set of challenges, marked by a rise in model complexity and an expansion of parameters, thereby creating a need for a larger volume of data. This thesis responds to this challenge by proposing a robust model tailored for emotion recognition, specifically focusing on leveraging audio and text data. Our approach is centered on using audio spectrogram transformers (AST), and the powerful BERT language model to extract distinctive features from both auditory and textual modalities followed by feature fusion. Despite the absence of the visual component, employed by state-of-the-art (SOTA) methods, our model demonstrates comparable performance levels achieving an f1 score of 0.67 when benchmarked against existing standards on the IEMOCAP dataset [1] which consists of 12-hour audio recordings broken down into 5255 scripted and 4784 spontaneous turns, with each turn labeled by emotions such as anger, neutral, frustration, happy, and sad. In essence, We propose a fully attention-focused multimodal approach for effective emotion analysis for relatively smaller datasets leveraging lightweight data sources like audio and text highlighting the efficacy of our proposed model. For reproducibility, the code is available at 2AI Lab’s GitHub repository: https://github.com/2ai-lab/multimodal-emotion
Robust Modeling of Epistemic Mental States
This work identifies and advances some research challenges in the analysis of
facial features and their temporal dynamics with epistemic mental states in
dyadic conversations. Epistemic states are: Agreement, Concentration,
Thoughtful, Certain, and Interest. In this paper, we perform a number of
statistical analyses and simulations to identify the relationship between
facial features and epistemic states. Non-linear relations are found to be more
prevalent, while temporal features derived from original facial features have
demonstrated a strong correlation with intensity changes. Then, we propose a
novel prediction framework that takes facial features and their nonlinear
relation scores as input and predict different epistemic states in videos. The
prediction of epistemic states is boosted when the classification of emotion
changing regions such as rising, falling, or steady-state are incorporated with
the temporal features. The proposed predictive models can predict the epistemic
states with significantly improved accuracy: correlation coefficient (CoERR)
for Agreement is 0.827, for Concentration 0.901, for Thoughtful 0.794, for
Certain 0.854, and for Interest 0.913.Comment: Accepted for Publication in Multimedia Tools and Application, Special
Issue: Socio-Affective Technologie
- …