Automatic emotion recognition from speech has been recently focused on the prediction of time-continuous dimensions (e.g., arousal and valence) of spontaneous and realistic expressions of emotion, as found in real-life interactions. However, the automatic prediction of such emotions poses several challenges, such as the subjectivity found in the definition of a gold standard from a pool of raters and the issue of data scarcity in training models. In this work, we introduce a novel emotion recognition system, based on ensemble of single-speaker-regression-models (SSRMs). The estimation of emotion is provided by combining a subset of the initial pool of SSRMs selecting those that are most concordance among them. The proposed approach allows the addition or removal of speakers from the ensemble without the necessity to re-build the entire machine learning system. The simplicity of this aggregation strategy, coupled with the flexibility assured by the modular architecture, and the promising results obtained on the RECOLA database highlight the potential implications of the proposed method in a real-life scenario and in particular in WEB-based applications

Martinelli, E

Mencattini, A

Natale, CD

Ringeval, F

Schuller, B

IEEE Transactions on Affective Computing

English

Di Natale, C

Name not available

Continuous Estimation of Emotions in Speech by Dynamic Cooperative Speaker Models

MENCATTINI, ARIANNA

MARTINELLI, EUGENIO

DI NATALE, CORRADO

Mencattini, Arianna

Martinelli, Eugenio

Ringeval, Fabien

Schuller, Björn

Natale, Corrado Di

OPUS Augsburg

Continuous estimation of emotions in speech by dynamic cooperative speaker models

Spiral - Imperial College Digital Repository

Arianna Mencattini

Eugenio Martinelli

Fabien Ringeval

Bjorn Schuller

Corrado Di Natale

Crossref

Online-Publikationserver Augsburg

International audienceResearch on automatic emotion recognition from speech has recently focused on the prediction of time-continuous dimensions (e. g., arousal and valence) of spontaneous and realistic expressions of emotion, as found in real-life interactions. However, the automatic prediction of such emotions poses several challenges, such as the subjectivity found in the definition of a gold-standard from a pool of raters and the issue of data scarcity in training models. In this work, we introduce a novel emotion recognition system, based on ensembles of single-speaker-regression-models. The estimation of emotion is provided by combining a subset of the initial pool of single-speaker-regression-models selecting those that are most concordant among them. The proposed approach allows the addition or removal of speakers from the ensemble without the necessity to rebuild the entire recognition system. The simplicity of this aggregation strategy, coupled with the flexibility assured by the modular architecture, and the promising results observed on the RECOLA database highlight the potential implications of the proposed method in a real-life scenario and in particular in web-based applications

Hal - Université Grenoble Alpes

HAL: Hyper Article en Ligne

https://opus.bibliothek.uni-augsburg.de/opus4/files/72029/72029.pdf

Continuous Estimation of Emotions in Speech by Dynamic Cooperative Speaker Models

Abstract

Similar works

Full text

Available Versions