Towards Speech Emotion Recognition "in the wild" using Aggregated
  Corpora and Deep Multi-Task Learning

Englebienne, Gwenn; Evers, Vanessa; Kim, Jaebok; Truong, Khiet P.

research

Towards Speech Emotion Recognition "in the wild" using Aggregated Corpora and Deep Multi-Task Learning

Authors: Gwenn Englebienne
Vanessa Evers
Jaebok Kim
Khiet P. Truong
Publication date: 1 January 2017
Publisher
Doi

Abstract

One of the challenges in Speech Emotion Recognition (SER) "in the wild" is the large mismatch between training and test data (e.g. speakers and tasks). In order to improve the generalisation capabilities of the emotion models, we propose to use Multi-Task Learning (MTL) and use gender and naturalness as auxiliary tasks in deep neural networks. This method was evaluated in within-corpus and various cross-corpus classification experiments that simulate conditions "in the wild". In comparison to Single-Task Learning (STL) based state of the art methods, we found that our MTL method proposed improved performance significantly. Particularly, models using both gender and naturalness achieved more gains than those using either gender or naturalness separately. This benefit was also found in the high-level representations of the feature space, obtained from our method proposed, where discriminative emotional clusters could be observed.Comment: Published in the proceedings of INTERSPEECH, Stockholm, September, 201

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

Crossref

info:doi/10.21437%2Finterspeec...

Last time updated on 01/04/2019

NARCIS

Last time updated on 20/03/2018

University of Twente Research Information

oai:ris.utwente.nl:publication...

Last time updated on 12/07/2023