Search CORE

110,600 research outputs found

Spoken affect classification : algorithms and experimental implementation : a thesis presented in partial fulfilment of the requirements for the degree of Master of Science in Computer Science at Massey University, Palmerston North, New Zealand

Author: Morrison Donn Alexander
Publication venue: 'Massey University'
Publication date: 01/01/2005
Field of study

Machine-based emotional intelligence is a requirement for natural interaction between humans and computer interfaces and a basic level of accurate emotion perception is needed for computer systems to respond adequately to human emotion. Humans convey emotional information both intentionally and unintentionally via speech patterns. These vocal patterns are perceived and understood by listeners during conversation. This research aims to improve the automatic perception of vocal emotion in two ways. First, we compare two emotional speech data sources: natural, spontaneous emotional speech and acted or portrayed emotional speech. This comparison demonstrates the advantages and disadvantages of both acquisition methods and how these methods affect the end application of vocal emotion recognition. Second, we look at two classification methods which have gone unexplored in this field: stacked generalisation and unweighted vote. We show how these techniques can yield an improvement over traditional classification methods

Massey Research Online

Emotions in context: examining pervasive affective sensing systems, applications, and analyses

Author: A Gaggioli
A Ståhl
Alan Chamberlain
C Küblbeck
C Peter
D Christin
D Matsumoto
E André
Eiman Kanjo
G Gartner
G Gay
IB Mauss
J Yamamoto
Luluah Al-Husain
MA Killingsworth
ME Morris
Mike Thelwall
MM Bradley
MN Burns
P Desmet
P Ekman
PC Ellsworth
PJ Lang
RA Calvo
RW Picard
S Matei
SD Pressman
T Bänziger
T-R Chang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

Pervasive sensing has opened up new opportunities for measuring our feelings and understanding our behavior by monitoring our affective states while mobile. This review paper surveys pervasive affect sensing by examining and considering three major elements of affective pervasive systems, namely; “sensing”, “analysis”, and “application”. Sensing investigates the different sensing modalities that are used in existing real-time affective applications, Analysis explores different approaches to emotion recognition and visualization based on different types of collected data, and Application investigates different leading areas of affective applications. For each of the three aspects, the paper includes an extensive survey of the literature and finally outlines some of challenges and future research opportunities of affective sensing in the context of pervasive computing

Crossref

Springer - Publisher Connector

Nottingham Trent Institutional Repository (IRep)

Multimodal Speech Emotion Recognition Using Audio and Text

Author: Byun Seokhyun
Jung Kyomin
Yoon Seunghyun
Publication venue
Publication date: 10/10/2018
Field of study

Speech emotion recognition is a challenging task, and extensive reliance has been placed on models that use audio features in building well-performing classifiers. In this paper, we propose a novel deep dual recurrent encoder model that utilizes text data and audio signals simultaneously to obtain a better understanding of speech data. As emotional dialogue is composed of sound and spoken content, our model encodes the information from audio and text sequences using dual recurrent neural networks (RNNs) and then combines the information from these sources to predict the emotion class. This architecture analyzes speech data from the signal level to the language level, and it thus utilizes the information within the data more comprehensively than models that focus on audio features. Extensive experiments are conducted to investigate the efficacy and properties of the proposed model. Our proposed model outperforms previous state-of-the-art methods in assigning data to one of four emotion categories (i.e., angry, happy, sad and neutral) when the model is applied to the IEMOCAP dataset, as reflected by accuracies ranging from 68.8% to 71.8%.Comment: 7 pages, Accepted as a conference paper at IEEE SLT 201

arXiv.org e-Print Archive

Crossref

SNU Open Repository and Archive

Emotion Recognition from Acted and Spontaneous Speech

Author: Atassi Hicham
Publication venue: Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií
Publication date: 01/01/2014
Field of study

Dizertační práce se zabývá rozpoznáním emočního stavu mluvčích z řečového signálu. Práce je rozdělena do dvou hlavních častí, první část popisuju navržené metody pro rozpoznání emočního stavu z hraných databází. V rámci této části jsou představeny výsledky rozpoznání použitím dvou různých databází s různými jazyky. Hlavními přínosy této části je detailní analýza rozsáhlé škály různých příznaků získaných z řečového signálu, návrh nových klasifikačních architektur jako je například „emoční párování“ a návrh nové metody pro mapování diskrétních emočních stavů do dvou dimenzionálního prostoru. Druhá část se zabývá rozpoznáním emočních stavů z databáze spontánní řeči, která byla získána ze záznamů hovorů z reálných call center. Poznatky z analýzy a návrhu metod rozpoznání z hrané řeči byly využity pro návrh nového systému pro rozpoznání sedmi spontánních emočních stavů. Jádrem navrženého přístupu je komplexní klasifikační architektura založena na fúzi různých systémů. Práce se dále zabývá vlivem emočního stavu mluvčího na úspěšnosti rozpoznání pohlaví a návrhem systému pro automatickou detekci úspěšných hovorů v call centrech na základě analýzy parametrů dialogu mezi účastníky telefonních hovorů.Doctoral thesis deals with emotion recognition from speech signals. The thesis is divided into two main parts; the first part describes proposed approaches for emotion recognition using two different multilingual databases of acted emotional speech. The main contributions of this part are detailed analysis of a big set of acoustic features, new classification schemes for vocal emotion recognition such as “emotion coupling” and new method for mapping discrete emotions into two-dimensional space. The second part of this thesis is devoted to emotion recognition using multilingual databases of spontaneous emotional speech, which is based on telephone records obtained from real call centers. The knowledge gained from experiments with emotion recognition from acted speech was exploited to design a new approach for classifying seven emotional states. The core of the proposed approach is a complex classification architecture based on the fusion of different systems. The thesis also examines the influence of speaker’s emotional state on gender recognition performance and proposes system for automatic identification of successful phone calls in call center by means of dialogue features.

Digital library of Brno University of Technology

National Repository of Grey Literature

Speech Emotion Recognition Using Multi-hop Attention Mechanism

Author: Byun Seokhyun
Dey Subhadeep
Jung Kyomin
Yoon Seunghyun
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 09/05/2019
Field of study

In this paper, we are interested in exploiting textual and acoustic data of an utterance for the speech emotion classification task. The baseline approach models the information from audio and text independently using two deep neural networks (DNNs). The outputs from both the DNNs are then fused for classification. As opposed to using knowledge from both the modalities separately, we propose a framework to exploit acoustic information in tandem with lexical data. The proposed framework uses two bi-directional long short-term memory (BLSTM) for obtaining hidden representations of the utterance. Furthermore, we propose an attention mechanism, referred to as the multi-hop, which is trained to automatically infer the correlation between the modalities. The multi-hop attention first computes the relevant segments of the textual data corresponding to the audio signal. The relevant textual data is then applied to attend parts of the audio signal. To evaluate the performance of the proposed system, experiments are performed in the IEMOCAP dataset. Experimental results show that the proposed technique outperforms the state-of-the-art system by 6.5% relative improvement in terms of weighted accuracy.Comment: 5 pages, Accepted as a conference paper at ICASSP 2019 (oral presentation

arXiv.org e-Print Archive

Crossref

Towards Indonesian Speech-Emotion Automatic Recognition (I-SpEAR)

Author: Soelistio Yustinus Eko
Wunarso Novita Belinda
Publication venue
Publication date: 01/01/2017
Field of study

Even though speech-emotion recognition (SER) has been receiving much attention as research topic, there are still some disputes about which vocal features can identify certain emotion. Emotion expression is also known to be differed according to the cultural backgrounds that make it important to study SER specific to the culture where the language belongs to. Furthermore, only a few studies addresses the SER in Indonesian which what this study attempts to explore. In this study, we extract simple features from 3420 voice data gathered from 38 participants. The features are compared by means of linear mixed effect model which shows that people who are in emotional and non-emotional state can be differentiated by their speech duration. Using SVM and speech duration as input feature, we achieve 76.84% average accuracy in classifying emotional and non-emotional speech.Comment: 4 pages, 3 tables, published in 4th International Conference on New Media (Conmedia) on 8-10 Nov. 2017 (http://conmedia.umn.ac.id/) [in print as in Sept. 17, 2017

arXiv.org e-Print Archive

UMN Knowledge Center