2,220 research outputs found
MER 2023: Multi-label Learning, Modality Robustness, and Semi-Supervised Learning
Over the past few decades, multimodal emotion recognition has made remarkable
progress with the development of deep learning. However, existing technologies
are difficult to meet the demand for practical applications. To improve the
robustness, we launch a Multimodal Emotion Recognition Challenge (MER 2023) to
motivate global researchers to build innovative technologies that can further
accelerate and foster research. For this year's challenge, we present three
distinct sub-challenges: (1) MER-MULTI, in which participants recognize both
discrete and dimensional emotions; (2) MER-NOISE, in which noise is added to
test videos for modality robustness evaluation; (3) MER-SEMI, which provides
large amounts of unlabeled samples for semi-supervised learning. In this paper,
we test a variety of multimodal features and provide a competitive baseline for
each sub-challenge. Our system achieves 77.57% on the F1 score and 0.82 on the
mean squared error (MSE) for MER-MULTI, 69.82% on the F1 score and 1.12 on MSE
for MER-NOISE, and 86.75% on the F1 score for MER-SEMI, respectively. Baseline
code is available at https://github.com/zeroQiaoba/MER2023-Baseline
Speaker-independent emotion recognition exploiting a psychologically-inspired binary cascade classification schema
In this paper, a psychologically-inspired binary cascade classification schema is proposed for speech emotion recognition. Performance is enhanced because commonly confused pairs of emotions are distinguishable from one another. Extracted features are related to statistics of pitch, formants, and energy contours, as well as spectrum, cepstrum, perceptual and temporal features, autocorrelation, MPEG-7 descriptors, Fujisakis model parameters, voice quality, jitter, and shimmer. Selected features are fed as input to K nearest neighborhood classifier and to support vector machines. Two kernels are tested for the latter: Linear and Gaussian radial basis function. The recently proposed speaker-independent experimental protocol is tested on the Berlin emotional speech database for each gender separately. The best emotion recognition accuracy, achieved by support vector machines with linear kernel, equals 87.7%, outperforming state-of-the-art approaches. Statistical analysis is first carried out with respect to the classifiers error rates and then to evaluate the information expressed by the classifiers confusion matrices. © Springer Science+Business Media, LLC 2011
Multi-score Learning for Affect Recognition: the Case of Body Postures
An important challenge in building automatic affective state
recognition systems is establishing the ground truth. When the groundtruth
is not available, observers are often used to label training and testing
sets. Unfortunately, inter-rater reliability between observers tends to
vary from fair to moderate when dealing with naturalistic expressions.
Nevertheless, the most common approach used is to label each expression
with the most frequent label assigned by the observers to that expression.
In this paper, we propose a general pattern recognition framework
that takes into account the variability between observers for automatic
affect recognition. This leads to what we term a multi-score learning
problem in which a single expression is associated with multiple values
representing the scores of each available emotion label. We also propose
several performance measurements and pattern recognition methods for
this framework, and report the experimental results obtained when testing
and comparing these methods on two affective posture datasets
Current Challenges and Visions in Music Recommender Systems Research
Music recommender systems (MRS) have experienced a boom in recent years,
thanks to the emergence and success of online streaming services, which
nowadays make available almost all music in the world at the user's fingertip.
While today's MRS considerably help users to find interesting music in these
huge catalogs, MRS research is still facing substantial challenges. In
particular when it comes to build, incorporate, and evaluate recommendation
strategies that integrate information beyond simple user--item interactions or
content-based descriptors, but dig deep into the very essence of listener
needs, preferences, and intentions, MRS research becomes a big endeavor and
related publications quite sparse.
The purpose of this trends and survey article is twofold. We first identify
and shed light on what we believe are the most pressing challenges MRS research
is facing, from both academic and industry perspectives. We review the state of
the art towards solving these challenges and discuss its limitations. Second,
we detail possible future directions and visions we contemplate for the further
evolution of the field. The article should therefore serve two purposes: giving
the interested reader an overview of current challenges in MRS research and
providing guidance for young researchers by identifying interesting, yet
under-researched, directions in the field
Multimodal Speech Emotion Recognition
Tato práce se zaměřuje na problém Rozpoznávánı́ emocı́, který spadá do třı́dy problémů Zpracovánı́ přirozeného jazyka. Cı́lem této práce bylo vytvořit modely strojového učenı́ na rozpoznánı́ emocı́ z textu a ze zvuku. Práce základně seznámı́ čtenáře s tı́mto problémem, s možnostmi reprezentace emocı́, s dostupnými datovými sadami a s existujı́cı́mi řešenı́mi. Poté se v práci popisujı́ naše navrhnutá řešenı́ pro úlohy Rozpoznávánı́ emocı́ z textu, Rozpoznávánı́ emocı́ ze zvuku a Multimodálnı́ho rozpoznávánı́ emocı́ z řeči. Dále popisujeme experimenty, které jsme provedli, prezentujeme dosažené výsledky těchto experimentů a ukazujeme naše dvě praktické demo aplikace. Dva z našich navrhovaných modelů porazily předchozı́ nejlepšı́ dos-tupné řešenı́ z roku 2018. Všechny experimenty a modely byly naprogramovány v programovacı́m jazyce Python.This work focuses on the Emotion Recognition task, which falls into the Natural Language Processing problems. The goal of this work was to create Machine learning models to recognize emotions from text and audio. The work introduces the problem, possible emotion representations, available datasets, and existing solutions to a reader. It then describes our proposed solutions for Text Emotion Recognition (TER), Speech Emotion Recognition (SER), and Multimodal Speech Emotion Recognition tasks. Further, we describe the experiments we have conducted, present the results of those experiments, and show our two demo practical applications. Two of our proposed models were able to outperform a previous state-of-the-art solution from 2018. All experiments and models were programmed in the Python programming language
Leveraging Label Information for Multimodal Emotion Recognition
Multimodal emotion recognition (MER) aims to detect the emotional status of a
given expression by combining the speech and text information. Intuitively,
label information should be capable of helping the model locate the salient
tokens/frames relevant to the specific emotion, which finally facilitates the
MER task. Inspired by this, we propose a novel approach for MER by leveraging
label information. Specifically, we first obtain the representative label
embeddings for both text and speech modalities, then learn the label-enhanced
text/speech representations for each utterance via label-token and label-frame
interactions. Finally, we devise a novel label-guided attentive fusion module
to fuse the label-aware text and speech representations for emotion
classification. Extensive experiments were conducted on the public IEMOCAP
dataset, and experimental results demonstrate that our proposed approach
outperforms existing baselines and achieves new state-of-the-art performance.Comment: Accepted by Interspeech 202
- …