Search CORE

4 research outputs found

Anchor model fusion for emotion recognition in speech

Author: González-Rodríguez Joaquín
López Moreno Ignacio
Ortego Resa Carlos
Ramos Daniel
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

Proceedings of Joint COST 2101 and 2102 International Conference, BioID_MultiComm 2009, Madrid (Spain)The final publication is available at Springer via http://dx.doi.org/10.1007/978-3-642-04391-8_7In this work, a novel method for system fusion in emotion recognition for speech is presented. The proposed approach, namely Anchor Model Fusion (AMF), exploits the characteristic behaviour of the scores of a speech utterance among different emotion models, by a mapping to a back-end anchor-model feature space followed by a SVM classifier. Experiments are presented in three different databases: Ahumada III, with speech obtained from real forensic cases; and SUSAS Actual and SUSAS Simulated. Results comparing AMF with a simple sum-fusion scheme after normalization show a significant performance improvement of the proposed technique for two of the three experimental set-ups, without degrading performance in the third one.This work has been financed under project TEC2006-13170-C02-01

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Biblos-e Archivo

Detección de emociones en voz espontánea

Author: Ortego Resa Carlos
Publication venue
Publication date: 01/01/2009
Field of study

Biblos-e Archivo

Speech Based Machine Learning Models for Emotional State Recognition and PTSD Detection

Author: Banerjee Debrup
Publication venue: ODU Digital Commons
Publication date: 01/07/2017
Field of study

Recognition of emotional state and diagnosis of trauma related illnesses such as posttraumatic stress disorder (PTSD) using speech signals have been active research topics over the past decade. A typical emotion recognition system consists of three components: speech segmentation, feature extraction and emotion identification. Various speech features have been developed for emotional state recognition which can be divided into three categories, namely, excitation, vocal tract and prosodic. However, the capabilities of different feature categories and advanced machine learning techniques have not been fully explored for emotion recognition and PTSD diagnosis. For PTSD assessment, clinical diagnosis through structured interviews is a widely accepted means of diagnosis, but patients are often embarrassed to get diagnosed at clinics. The speech signal based system is a recently developed alternative. Unfortunately,PTSD speech corpora are limited in size which presents difficulties in training complex diagnostic models. This dissertation proposed sparse coding methods and deep belief network models for emotional state identification and PTSD diagnosis. It also includes an additional transfer learning strategy for PTSD diagnosis. Deep belief networks are complex models that cannot work with small data like the PTSD speech database. Thus, a transfer learning strategy was adopted to mitigate the small data problem. Transfer learning aims to extract knowledge from one or more source tasks and apply the knowledge to a target task with the intention of improving the learning. It has proved to be useful when the target task has limited high quality training data. We evaluated the proposed methods on the speech under simulated and actual stress database (SUSAS) for emotional state recognition and on two PTSD speech databases for PTSD diagnosis. Experimental results and statistical tests showed that the proposed models outperformed most state-of-the-art methods in the literature and are potentially efficient models for emotional state recognition and PTSD diagnosis

Old Dominion University

Dynamic Estimation of Rater Reliability using Multi-Armed Bandits

Author: Tarasov Alexey
Publication venue: Dublin Institute of Technology
Publication date: 01/05/2014
Field of study

One of the critical success factors for supervised machine learning is the quality of target values, or predictions, associated with training instances. Predictions can be discrete labels (such as a binary variable specifying whether a blog post is positive or negative) or continuous ratings (for instance, how boring a video is on a 10-point scale). In some areas, predictions are readily available, while in others, the eort of human workers has to be involved. For instance, in the task of emotion recognition from speech, a large corpus of speech recordings is usually available, and humans denote which emotions are present in which recordings

Arrow@TUDublin