Search CORE

2,220 research outputs found

MER 2023: Multi-label Learning, Modality Robustness, and Semi-Supervised Learning

Author: Cambria Erik
Lian Zheng
Liu Bin
Liu Ye
Schuller Björn W.
Sun Haiyang
Sun Licai
Tao Jianhua
Wang Meng
Yi Jiangyan
Zhao Guoying
Zhao Jinming
Publication venue
Publication date: 18/04/2023
Field of study

Over the past few decades, multimodal emotion recognition has made remarkable progress with the development of deep learning. However, existing technologies are difficult to meet the demand for practical applications. To improve the robustness, we launch a Multimodal Emotion Recognition Challenge (MER 2023) to motivate global researchers to build innovative technologies that can further accelerate and foster research. For this year's challenge, we present three distinct sub-challenges: (1) MER-MULTI, in which participants recognize both discrete and dimensional emotions; (2) MER-NOISE, in which noise is added to test videos for modality robustness evaluation; (3) MER-SEMI, which provides large amounts of unlabeled samples for semi-supervised learning. In this paper, we test a variety of multimodal features and provide a competitive baseline for each sub-challenge. Our system achieves 77.57% on the F1 score and 0.82 on the mean squared error (MSE) for MER-MULTI, 69.82% on the F1 score and 1.12 on MSE for MER-NOISE, and 86.75% on the F1 score for MER-SEMI, respectively. Baseline code is available at https://github.com/zeroQiaoba/MER2023-Baseline

arXiv.org e-Print Archive

Speaker-independent emotion recognition exploiting a psychologically-inspired binary cascade classification schema

Author: A. Austermann
B. Schuller
B. Schuller
B. Schuller
B. Schuller
B. Schuller
B. Schuller
B. Yang
C. Busso
C. M. Lee
C. Nass
D. Bitouk
D. J. C. MacKay
D. Ververidis
D. Ververidis
D. Watson
E. Benetos
E. Benetos
E. Fersini
E. I. Konstantinidis
F. Burkhardt
F. Burkhardt
Fabio Paternò
H. Altun
H. Gunes
H. K. Mishra
H. Mixdorff
H. P. Espinosa
I. Guyon
I. Guyon
I. R. Murray
J. D. Markel
J. Hirschberg
J. Pittermann
K. Dai
K. R. Scherer
L. B. Jackson
M. Ayadi El
M. Kotti
M. Kotti
M. M. Sondhi
M. Pantic
M. Pantic
Margarita Kotti
N. Sato
N. Vanello
P. Boersma
P. Ekman
P. Ekman
P. N. Juslin
P. Ruvolo
P. Zervas
R. A. Calvo
R. Cowie
R. Tato
R. W. Picard
S. Chandaka
S. Ntalampiras
T. Iliou
T. L. Pao
T. P. Kostoulas
T. Vogt
W. Bosma
W. Minker
Z. Inanoglu
Z. Zeng
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/06/2012
Field of study

In this paper, a psychologically-inspired binary cascade classification schema is proposed for speech emotion recognition. Performance is enhanced because commonly confused pairs of emotions are distinguishable from one another. Extracted features are related to statistics of pitch, formants, and energy contours, as well as spectrum, cepstrum, perceptual and temporal features, autocorrelation, MPEG-7 descriptors, Fujisakis model parameters, voice quality, jitter, and shimmer. Selected features are fed as input to K nearest neighborhood classifier and to support vector machines. Two kernels are tested for the latter: Linear and Gaussian radial basis function. The recently proposed speaker-independent experimental protocol is tested on the Berlin emotional speech database for each gender separately. The best emotion recognition accuracy, achieved by support vector machines with linear kernel, equals 87.7%, outperforming state-of-the-art approaches. Statistical analysis is first carried out with respect to the classifiers error rates and then to evaluate the information expressed by the classifiers confusion matrices. © Springer Science+Business Media, LLC 2011

Crossref

Spiral - Imperial College Digital Repository

Multi-score Learning for Affect Recognition: the Case of Body Postures

Author: A. Argyriou
A. Camurri
A. Kleinsmith
A. Kleinsmith
D.F. Specht
G. Salton
H.M. Paterson
J. Fürnkranz
J. Moody
J. Wagner
K.V. Mardia
M. Pantic
M. Zhang
P.R. Silva De
R. Rosipal
R.L. Mandryk
R.W. Picard
S. Jong
Z. Zeng
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

An important challenge in building automatic affective state recognition systems is establishing the ground truth. When the groundtruth is not available, observers are often used to label training and testing sets. Unfortunately, inter-rater reliability between observers tends to vary from fair to moderate when dealing with naturalistic expressions. Nevertheless, the most common approach used is to label each expression with the most frequent label assigned by the observers to that expression. In this paper, we propose a general pattern recognition framework that takes into account the variability between observers for automatic affect recognition. This leads to what we term a multi-score learning problem in which a single expression is associated with multiple values representing the scores of each available emotion label. We also propose several performance measurements and pattern recognition methods for this framework, and report the experimental results obtained when testing and comparing these methods on two affective posture datasets

Crossref

Brunel University Research Archive

Current Challenges and Visions in Music Recommender Systems Research

Author: Chen Ching-Wei
Deldjoo Yashar
Elahi Mehdi
Schedl Markus
Zamani Hamed
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 21/03/2018
Field of study

Music recommender systems (MRS) have experienced a boom in recent years, thanks to the emergence and success of online streaming services, which nowadays make available almost all music in the world at the user's fingertip. While today's MRS considerably help users to find interesting music in these huge catalogs, MRS research is still facing substantial challenges. In particular when it comes to build, incorporate, and evaluate recommendation strategies that integrate information beyond simple user--item interactions or content-based descriptors, but dig deep into the very essence of listener needs, preferences, and intentions, MRS research becomes a big endeavor and related publications quite sparse. The purpose of this trends and survey article is twofold. We first identify and shed light on what we believe are the most pressing challenges MRS research is facing, from both academic and industry perspectives. We review the state of the art towards solving these challenges and discuss its limitations. Second, we detail possible future directions and visions we contemplate for the further evolution of the field. The article should therefore serve two purposes: giving the interested reader an overview of current challenges in MRS research and providing guidance for young researchers by identifying interesting, yet under-researched, directions in the field

arXiv.org e-Print Archive

JKU | ePub

Multimodal Speech Emotion Recognition

Author: Jan Čuhel
Publication venue: Czech Technical University in Prague. Computing and Information Centre.
Publication date: 03/06/2021
Field of study

Tato práce se zaměřuje na problém Rozpoznávánı́ emocı́, který spadá do třı́dy problémů Zpracovánı́ přirozeného jazyka. Cı́lem této práce bylo vytvořit modely strojového učenı́ na rozpoznánı́ emocı́ z textu a ze zvuku. Práce základně seznámı́ čtenáře s tı́mto problémem, s možnostmi reprezentace emocı́, s dostupnými datovými sadami a s existujı́cı́mi řešenı́mi. Poté se v práci popisujı́ naše navrhnutá řešenı́ pro úlohy Rozpoznávánı́ emocı́ z textu, Rozpoznávánı́ emocı́ ze zvuku a Multimodálnı́ho rozpoznávánı́ emocı́ z řeči. Dále popisujeme experimenty, které jsme provedli, prezentujeme dosažené výsledky těchto experimentů a ukazujeme naše dvě praktické demo aplikace. Dva z našich navrhovaných modelů porazily předchozı́ nejlepšı́ dos-tupné řešenı́ z roku 2018. Všechny experimenty a modely byly naprogramovány v programovacı́m jazyce Python.This work focuses on the Emotion Recognition task, which falls into the Natural Language Processing problems. The goal of this work was to create Machine learning models to recognize emotions from text and audio. The work introduces the problem, possible emotion representations, available datasets, and existing solutions to a reader. It then describes our proposed solutions for Text Emotion Recognition (TER), Speech Emotion Recognition (SER), and Multimodal Speech Emotion Recognition tasks. Further, we describe the experiments we have conducted, present the results of those experiments, and show our two demo practical applications. Two of our proposed models were able to outperform a previous state-of-the-art solution from 2018. All experiments and models were programmed in the Python programming language

Digital Library of the Czech Technical University in Prague

Multimodal Emotion Recognition with Modality-Pairwise Unsupervised Contrastive Loss

Author: Arrigoni Federica
Beyan Cigdem
Conti Alessandro
Fini Enrico
Franceschini Riccardo
Ricci Elisa
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2022
Field of study

Archivio istituzionale della ricerca - Politecnico di Milano

Leveraging Label Information for Multimodal Emotion Recognition

Author: Chen Junqing
Chen Meng
Fan Lu
He Xiaodong
Wang Peiying
Wu Youzheng
Zeng Sunlu
Publication venue
Publication date: 05/09/2023
Field of study

Multimodal emotion recognition (MER) aims to detect the emotional status of a given expression by combining the speech and text information. Intuitively, label information should be capable of helping the model locate the salient tokens/frames relevant to the specific emotion, which finally facilitates the MER task. Inspired by this, we propose a novel approach for MER by leveraging label information. Specifically, we first obtain the representative label embeddings for both text and speech modalities, then learn the label-enhanced text/speech representations for each utterance via label-token and label-frame interactions. Finally, we devise a novel label-guided attentive fusion module to fuse the label-aware text and speech representations for emotion classification. Extensive experiments were conducted on the public IEMOCAP dataset, and experimental results demonstrate that our proposed approach outperforms existing baselines and achieves new state-of-the-art performance.Comment: Accepted by Interspeech 202

arXiv.org e-Print Archive