23,911 research outputs found
Transferable Positive/Negative Speech Emotion Recognition via Class-wise Adversarial Domain Adaptation
Speech emotion recognition plays an important role in building more
intelligent and human-like agents. Due to the difficulty of collecting speech
emotional data, an increasingly popular solution is leveraging a related and
rich source corpus to help address the target corpus. However, domain shift
between the corpora poses a serious challenge, making domain shift adaptation
difficult to function even on the recognition of positive/negative emotions. In
this work, we propose class-wise adversarial domain adaptation to address this
challenge by reducing the shift for all classes between different corpora.
Experiments on the well-known corpora EMODB and Aibo demonstrate that our
method is effective even when only a very limited number of target labeled
examples are provided.Comment: 5 pages, 3 figures, accepted to ICASSP 201
Speech-based recognition of self-reported and observed emotion in a dimensional space
The differences between self-reported and observed emotion have only marginally been investigated in the context of speech-based automatic emotion recognition. We address this issue by comparing self-reported emotion ratings to observed emotion ratings and look at how differences between these two types of ratings affect the development and performance of automatic emotion recognizers developed with these ratings. A dimensional approach to emotion modeling is adopted: the ratings are based on continuous arousal and valence scales. We describe the TNO-Gaming Corpus that contains spontaneous vocal and facial expressions elicited via a multiplayer videogame and that includes emotion annotations obtained via self-report and observation by outside observers. Comparisons show that there are discrepancies between self-reported and observed emotion ratings which are also reflected in the performance of the emotion recognizers developed. Using Support Vector Regression in combination with acoustic and textual features, recognizers of arousal and valence are developed that can predict points in a 2-dimensional arousal-valence space. The results of these recognizers show that the self-reported emotion is much harder to recognize than the observed emotion, and that averaging ratings from multiple observers improves performance
Emotion Recognition from Acted and Spontaneous Speech
Dizertační práce se zabývá rozpoznáním emočního stavu mluvčích z řečového signálu. Práce je rozdělena do dvou hlavních častí, první část popisuju navržené metody pro rozpoznání emočního stavu z hraných databází. V rámci této části jsou představeny výsledky rozpoznání použitím dvou různých databází s různými jazyky. Hlavními přínosy této části je detailní analýza rozsáhlé škály různých příznaků získaných z řečového signálu, návrh nových klasifikačních architektur jako je například „emoční párování“ a návrh nové metody pro mapování diskrétních emočních stavů do dvou dimenzionálního prostoru. Druhá část se zabývá rozpoznáním emočních stavů z databáze spontánní řeči, která byla získána ze záznamů hovorů z reálných call center. Poznatky z analýzy a návrhu metod rozpoznání z hrané řeči byly využity pro návrh nového systému pro rozpoznání sedmi spontánních emočních stavů. Jádrem navrženého přístupu je komplexní klasifikační architektura založena na fúzi různých systémů. Práce se dále zabývá vlivem emočního stavu mluvčího na úspěšnosti rozpoznání pohlaví a návrhem systému pro automatickou detekci úspěšných hovorů v call centrech na základě analýzy parametrů dialogu mezi účastníky telefonních hovorů.Doctoral thesis deals with emotion recognition from speech signals. The thesis is divided into two main parts; the first part describes proposed approaches for emotion recognition using two different multilingual databases of acted emotional speech. The main contributions of this part are detailed analysis of a big set of acoustic features, new classification schemes for vocal emotion recognition such as “emotion coupling” and new method for mapping discrete emotions into two-dimensional space. The second part of this thesis is devoted to emotion recognition using multilingual databases of spontaneous emotional speech, which is based on telephone records obtained from real call centers. The knowledge gained from experiments with emotion recognition from acted speech was exploited to design a new approach for classifying seven emotional states. The core of the proposed approach is a complex classification architecture based on the fusion of different systems. The thesis also examines the influence of speaker’s emotional state on gender recognition performance and proposes system for automatic identification of successful phone calls in call center by means of dialogue features.
Towards Speech Emotion Recognition "in the wild" using Aggregated Corpora and Deep Multi-Task Learning
One of the challenges in Speech Emotion Recognition (SER) "in the wild" is
the large mismatch between training and test data (e.g. speakers and tasks). In
order to improve the generalisation capabilities of the emotion models, we
propose to use Multi-Task Learning (MTL) and use gender and naturalness as
auxiliary tasks in deep neural networks. This method was evaluated in
within-corpus and various cross-corpus classification experiments that simulate
conditions "in the wild". In comparison to Single-Task Learning (STL) based
state of the art methods, we found that our MTL method proposed improved
performance significantly. Particularly, models using both gender and
naturalness achieved more gains than those using either gender or naturalness
separately. This benefit was also found in the high-level representations of
the feature space, obtained from our method proposed, where discriminative
emotional clusters could be observed.Comment: Published in the proceedings of INTERSPEECH, Stockholm, September,
201
Exploring Language-Independent Emotional Acoustic Features via Feature Selection
We propose a novel feature selection strategy to discover
language-independent acoustic features that tend to be responsible for emotions
regardless of languages, linguistics and other factors. Experimental results
suggest that the language-independent feature subset discovered yields the
performance comparable to the full feature set on various emotional speech
corpora.Comment: 15 pages, 2 figures, 6 table
- …