6,420 research outputs found
UR-FUNNY: A Multimodal Language Dataset for Understanding Humor
Humor is a unique and creative communicative behavior displayed during social
interactions. It is produced in a multimodal manner, through the usage of words
(text), gestures (vision) and prosodic cues (acoustic). Understanding humor
from these three modalities falls within boundaries of multimodal language; a
recent research trend in natural language processing that models natural
language as it happens in face-to-face communication. Although humor detection
is an established research area in NLP, in a multimodal context it is an
understudied area. This paper presents a diverse multimodal dataset, called
UR-FUNNY, to open the door to understanding multimodal language used in
expressing humor. The dataset and accompanying studies, present a framework in
multimodal humor detection for the natural language processing community.
UR-FUNNY is publicly available for research
Towards Multimodal Prediction of Spontaneous Humour: A Novel Dataset and First Results
Humour is a substantial element of human affect and cognition. Its automatic
understanding can facilitate a more naturalistic human-device interaction and
the humanisation of artificial intelligence. Current methods of humour
detection are solely based on staged data making them inadequate for
'real-world' applications. We address this deficiency by introducing the novel
Passau-Spontaneous Football Coach Humour (Passau-SFCH) dataset, comprising of
about 11 hours of recordings. The Passau-SFCH dataset is annotated for the
presence of humour and its dimensions (sentiment and direction) as proposed in
Martin's Humor Style Questionnaire. We conduct a series of experiments,
employing pretrained Transformers, convolutional neural networks, and
expert-designed features. The performance of each modality (text, audio, video)
for spontaneous humour recognition is analysed and their complementarity is
investigated. Our findings suggest that for the automatic analysis of humour
and its sentiment, facial expressions are most promising, while humour
direction can be best modelled via text-based features. The results reveal
considerable differences among various subjects, highlighting the individuality
of humour usage and style. Further, we observe that a decision-level fusion
yields the best recognition result. Finally, we make our code publicly
available at https://www.github.com/EIHW/passau-sfch. The Passau-SFCH dataset
is available upon request.Comment: This work has been submitted to the IEEE for possible publication.
Copyright may be transferred without notice, after which this version may no
longer be accessible (Major Revision
The MuSe 2022 Multimodal Sentiment Analysis Challenge: Humor, Emotional Reactions, and Stress
The Multimodal Sentiment Analysis Challenge (MuSe) 2022 is dedicated to
multimodal sentiment and emotion recognition. For this year's challenge, we
feature three datasets: (i) the Passau Spontaneous Football Coach Humor
(Passau-SFCH) dataset that contains audio-visual recordings of German football
coaches, labelled for the presence of humour; (ii) the Hume-Reaction dataset in
which reactions of individuals to emotional stimuli have been annotated with
respect to seven emotional expression intensities, and (iii) the Ulm-Trier
Social Stress Test (Ulm-TSST) dataset comprising of audio-visual data labelled
with continuous emotion values (arousal and valence) of people in stressful
dispositions. Using the introduced datasets, MuSe 2022 2022 addresses three
contemporary affective computing problems: in the Humor Detection Sub-Challenge
(MuSe-Humor), spontaneous humour has to be recognised; in the Emotional
Reactions Sub-Challenge (MuSe-Reaction), seven fine-grained `in-the-wild'
emotions have to be predicted; and in the Emotional Stress Sub-Challenge
(MuSe-Stress), a continuous prediction of stressed emotion values is featured.
The challenge is designed to attract different research communities,
encouraging a fusion of their disciplines. Mainly, MuSe 2022 targets the
communities of audio-visual emotion recognition, health informatics, and
symbolic sentiment analysis. This baseline paper describes the datasets as well
as the feature sets extracted from them. A recurrent neural network with LSTM
cells is used to set competitive baseline results on the test partitions for
each sub-challenge. We report an Area Under the Curve (AUC) of .8480 for
MuSe-Humor; .2801 mean (from 7-classes) Pearson's Correlations Coefficient for
MuSe-Reaction, as well as .4931 Concordance Correlation Coefficient (CCC) and
.4761 for valence and arousal in MuSe-Stress, respectively.Comment: Preliminary baseline paper for the 3rd Multimodal Sentiment Analysis
Challenge (MuSe) 2022, a full-day workshop at ACM Multimedia 202
A Multi-modal Approach to Fine-grained Opinion Mining on Video Reviews
Despite the recent advances in opinion mining for written reviews, few works
have tackled the problem on other sources of reviews. In light of this issue,
we propose a multi-modal approach for mining fine-grained opinions from video
reviews that is able to determine the aspects of the item under review that are
being discussed and the sentiment orientation towards them. Our approach works
at the sentence level without the need for time annotations and uses features
derived from the audio, video and language transcriptions of its contents. We
evaluate our approach on two datasets and show that leveraging the video and
audio modalities consistently provides increased performance over text-only
baselines, providing evidence these extra modalities are key in better
understanding video reviews.Comment: Second Grand Challenge and Workshop on Multimodal Language ACL 202
- …