82,221 research outputs found
Emotion Recognition from Acted and Spontaneous Speech
DizertaÄnĂ prĂĄce se zabĂ˝vĂĄ rozpoznĂĄnĂm emoÄnĂho stavu mluvÄĂch z ĹeÄovĂŠho signĂĄlu. PrĂĄce je rozdÄlena do dvou hlavnĂch ÄastĂ, prvnĂ ÄĂĄst popisuju navrĹženĂŠ metody pro rozpoznĂĄnĂ emoÄnĂho stavu z hranĂ˝ch databĂĄzĂ. V rĂĄmci tĂŠto ÄĂĄsti jsou pĹedstaveny vĂ˝sledky rozpoznĂĄnĂ pouĹžitĂm dvou rĹŻznĂ˝ch databĂĄzĂ s rĹŻznĂ˝mi jazyky. HlavnĂmi pĹĂnosy tĂŠto ÄĂĄsti je detailnĂ analĂ˝za rozsĂĄhlĂŠ ĹĄkĂĄly rĹŻznĂ˝ch pĹĂznakĹŻ zĂskanĂ˝ch z ĹeÄovĂŠho signĂĄlu, nĂĄvrh novĂ˝ch klasifikaÄnĂch architektur jako je napĹĂklad âemoÄnĂ pĂĄrovĂĄnĂâ a nĂĄvrh novĂŠ metody pro mapovĂĄnĂ diskrĂŠtnĂch emoÄnĂch stavĹŻ do dvou dimenzionĂĄlnĂho prostoru. DruhĂĄ ÄĂĄst se zabĂ˝vĂĄ rozpoznĂĄnĂm emoÄnĂch stavĹŻ z databĂĄze spontĂĄnnĂ ĹeÄi, kterĂĄ byla zĂskĂĄna ze zĂĄznamĹŻ hovorĹŻ z reĂĄlnĂ˝ch call center. Poznatky z analĂ˝zy a nĂĄvrhu metod rozpoznĂĄnĂ z hranĂŠ ĹeÄi byly vyuĹžity pro nĂĄvrh novĂŠho systĂŠmu pro rozpoznĂĄnĂ sedmi spontĂĄnnĂch emoÄnĂch stavĹŻ. JĂĄdrem navrĹženĂŠho pĹĂstupu je komplexnĂ klasifikaÄnĂ architektura zaloĹžena na fĂşzi rĹŻznĂ˝ch systĂŠmĹŻ. PrĂĄce se dĂĄle zabĂ˝vĂĄ vlivem emoÄnĂho stavu mluvÄĂho na ĂşspÄĹĄnosti rozpoznĂĄnĂ pohlavĂ a nĂĄvrhem systĂŠmu pro automatickou detekci ĂşspÄĹĄnĂ˝ch hovorĹŻ v call centrech na zĂĄkladÄ analĂ˝zy parametrĹŻ dialogu mezi ĂşÄastnĂky telefonnĂch hovorĹŻ.Doctoral thesis deals with emotion recognition from speech signals. The thesis is divided into two main parts; the first part describes proposed approaches for emotion recognition using two different multilingual databases of acted emotional speech. The main contributions of this part are detailed analysis of a big set of acoustic features, new classification schemes for vocal emotion recognition such as âemotion couplingâ and new method for mapping discrete emotions into two-dimensional space. The second part of this thesis is devoted to emotion recognition using multilingual databases of spontaneous emotional speech, which is based on telephone records obtained from real call centers. The knowledge gained from experiments with emotion recognition from acted speech was exploited to design a new approach for classifying seven emotional states. The core of the proposed approach is a complex classification architecture based on the fusion of different systems. The thesis also examines the influence of speakerâs emotional state on gender recognition performance and proposes system for automatic identification of successful phone calls in call center by means of dialogue features.
Ensemble of Hankel Matrices for Face Emotion Recognition
In this paper, a face emotion is considered as the result of the composition
of multiple concurrent signals, each corresponding to the movements of a
specific facial muscle. These concurrent signals are represented by means of a
set of multi-scale appearance features that might be correlated with one or
more concurrent signals. The extraction of these appearance features from a
sequence of face images yields to a set of time series. This paper proposes to
use the dynamics regulating each appearance feature time series to recognize
among different face emotions. To this purpose, an ensemble of Hankel matrices
corresponding to the extracted time series is used for emotion classification
within a framework that combines nearest neighbor and a majority vote schema.
Experimental results on a public available dataset shows that the adopted
representation is promising and yields state-of-the-art accuracy in emotion
classification.Comment: Paper to appear in Proc. of ICIAP 2015. arXiv admin note: text
overlap with arXiv:1506.0500
Speaker-independent emotion recognition exploiting a psychologically-inspired binary cascade classification schema
In this paper, a psychologically-inspired binary cascade classification schema is proposed for speech emotion recognition. Performance is enhanced because commonly confused pairs of emotions are distinguishable from one another. Extracted features are related to statistics of pitch, formants, and energy contours, as well as spectrum, cepstrum, perceptual and temporal features, autocorrelation, MPEG-7 descriptors, Fujisakis model parameters, voice quality, jitter, and shimmer. Selected features are fed as input to K nearest neighborhood classifier and to support vector machines. Two kernels are tested for the latter: Linear and Gaussian radial basis function. The recently proposed speaker-independent experimental protocol is tested on the Berlin emotional speech database for each gender separately. The best emotion recognition accuracy, achieved by support vector machines with linear kernel, equals 87.7%, outperforming state-of-the-art approaches. Statistical analysis is first carried out with respect to the classifiers error rates and then to evaluate the information expressed by the classifiers confusion matrices. Š Springer Science+Business Media, LLC 2011
Detecting User Engagement in Everyday Conversations
This paper presents a novel application of speech emotion recognition:
estimation of the level of conversational engagement between users of a voice
communication system. We begin by using machine learning techniques, such as
the support vector machine (SVM), to classify users' emotions as expressed in
individual utterances. However, this alone fails to model the temporal and
interactive aspects of conversational engagement. We therefore propose the use
of a multilevel structure based on coupled hidden Markov models (HMM) to
estimate engagement levels in continuous natural speech. The first level is
comprised of SVM-based classifiers that recognize emotional states, which could
be (e.g.) discrete emotion types or arousal/valence levels. A high-level HMM
then uses these emotional states as input, estimating users' engagement in
conversation by decoding the internal states of the HMM. We report experimental
results obtained by applying our algorithms to the LDC Emotional Prosody and
CallFriend speech corpora.Comment: 4 pages (A4), 1 figure (EPS
Exploring Language-Independent Emotional Acoustic Features via Feature Selection
We propose a novel feature selection strategy to discover
language-independent acoustic features that tend to be responsible for emotions
regardless of languages, linguistics and other factors. Experimental results
suggest that the language-independent feature subset discovered yields the
performance comparable to the full feature set on various emotional speech
corpora.Comment: 15 pages, 2 figures, 6 table
BigEAR: Inferring the Ambient and Emotional Correlates from Smartphone-based Acoustic Big Data
This paper presents a novel BigEAR big data framework that employs
psychological audio processing chain (PAPC) to process smartphone-based
acoustic big data collected when the user performs social conversations in
naturalistic scenarios. The overarching goal of BigEAR is to identify moods of
the wearer from various activities such as laughing, singing, crying, arguing,
and sighing. These annotations are based on ground truth relevant for
psychologists who intend to monitor/infer the social context of individuals
coping with breast cancer. We pursued a case study on couples coping with
breast cancer to know how the conversations affect emotional and social well
being. In the state-of-the-art methods, psychologists and their team have to
hear the audio recordings for making these inferences by subjective evaluations
that not only are time-consuming and costly, but also demand manual data coding
for thousands of audio files. The BigEAR framework automates the audio
analysis. We computed the accuracy of BigEAR with respect to the ground truth
obtained from a human rater. Our approach yielded overall average accuracy of
88.76% on real-world data from couples coping with breast cancer.Comment: 6 pages, 10 equations, 1 Table, 5 Figures, IEEE International
Workshop on Big Data Analytics for Smart and Connected Health 2016, June 27,
2016, Washington DC, US
- âŚ