Search CORE

30,732 research outputs found

Emotion Recognition from Speech using GMM and VQ

Author: Miss. Surabhi Agrawal, Mrs. Shabda Dongaonkar
Publication venue: 'Auricle Technologies, Pvt., Ltd.'
Publication date: 31/08/2015
Field of study

In this paper, there is a tendency to study the effectiveness of anchor models applied to the multiclass drawback of Emotion recognition from speech. Within the anchor models system, Associate in nursing emotion category is characterized by its line of similarity relative to different emotion categories. Generative models like Gaussian Mixture Models (GMMs) are typically used as front-end systems to get feature vectors wont to train complicated back-end systems like support vector machines (SVMs) or a multilayer perceptron (MLP) to enhance the classification performance. There is a tendency to show that within the context of extremely unbalanced knowledge categories, these back-end systems will improve the performance achieved by GMMs as long as Associate in nursing acceptable sampling or importance coefficient technique is applied. The experiments conducted on audio sample of speech; show that anchor models improve considerably the performance of GMMs by half dozen.2 % relative. There is a tendency to be employing a hybrid approach for recognizing emotion from speech that may be a combination of Vector quantization (VQ) and mathematician Mixture Models (GMM). A quick review of labor applied within the space of recognition victimization VQ-GMM hybrid approach is mentioned here. DOI: 10.17762/ijritcc2321-8169.15082

International Journal on Recent and Innovation Trends in Computing and Communication

Anchor model fusion for emotion recognition in speech

Author: González-Rodríguez Joaquín
López Moreno Ignacio
Ortego Resa Carlos
Ramos Daniel
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

Proceedings of Joint COST 2101 and 2102 International Conference, BioID_MultiComm 2009, Madrid (Spain)The final publication is available at Springer via http://dx.doi.org/10.1007/978-3-642-04391-8_7In this work, a novel method for system fusion in emotion recognition for speech is presented. The proposed approach, namely Anchor Model Fusion (AMF), exploits the characteristic behaviour of the scores of a speech utterance among different emotion models, by a mapping to a back-end anchor-model feature space followed by a SVM classifier. Experiments are presented in three different databases: Ahumada III, with speech obtained from real forensic cases; and SUSAS Actual and SUSAS Simulated. Results comparing AMF with a simple sum-fusion scheme after normalization show a significant performance improvement of the proposed technique for two of the three experimental set-ups, without degrading performance in the third one.This work has been financed under project TEC2006-13170-C02-01

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Biblos-e Archivo

Learnable PINs: Cross-Modal Embeddings for Person Identity

Author: Albanie Samuel
Nagrani Arsha
Zisserman Andrew
Publication venue
Publication date: 01/01/2018
Field of study

We propose and investigate an identity sensitive joint embedding of face and voice. Such an embedding enables cross-modal retrieval from voice to face and from face to voice. We make the following four contributions: first, we show that the embedding can be learnt from videos of talking faces, without requiring any identity labels, using a form of cross-modal self-supervision; second, we develop a curriculum learning schedule for hard negative mining targeted to this task, that is essential for learning to proceed successfully; third, we demonstrate and evaluate cross-modal retrieval for identities unseen and unheard during training over a number of scenarios and establish a benchmark for this novel task; finally, we show an application of using the joint embedding for automatically retrieving and labelling characters in TV dramas.Comment: To appear in ECCV 201

arXiv.org e-Print Archive

Oxford University Research Archive

Prerequisites for Affective Signal Processing (ASP) - Part V: A response to comments and suggestions

Author: Broek Egon L. van den
Healey Jennifer A.
Janssen Joris H.
Westerink Joyce H.D.M.
Zwaag Marjolein D. van der
Publication venue: INSTICC Press
Publication date: 01/01/2011
Field of study

In four papers, a set of eleven prerequisites for affective signal processing (ASP) were identified (van den Broek et al., 2010): validation, triangulation, a physiology-driven approach, contributions of the signal processing community, identification of users, theoretical specification, integration of biosignals, physical characteristics, historical perspective, temporal construction, and real-world baselines. Additionally, a review (in two parts) of affective computing was provided. Initiated by the reactions on these four papers, we now present: i) an extension of the review, ii) a post-hoc analysis based on the eleven prerequisites of Picard et al.(2001), and iii) a more detailed discussion and illustrations of temporal aspects with ASP

University of Twente Research Information