30,732 research outputs found

    Emotion Recognition from Speech using GMM and VQ

    Get PDF
    In this paper, there is a tendency to study the effectiveness of anchor models applied to the multiclass drawback of Emotion recognition from speech. Within the anchor models system, Associate in nursing emotion category is characterized by its line of similarity relative to different emotion categories. Generative models like Gaussian Mixture Models (GMMs) are typically used as front-end systems to get feature vectors wont to train complicated back-end systems like support vector machines (SVMs) or a multilayer perceptron (MLP) to enhance the classification performance. There is a tendency to show that within the context of extremely unbalanced knowledge categories, these back-end systems will improve the performance achieved by GMMs as long as Associate in nursing acceptable sampling or importance coefficient technique is applied. The experiments conducted on audio sample of speech; show that anchor models improve considerably the performance of GMMs by half dozen.2 % relative. There is a tendency to be employing a hybrid approach for recognizing emotion from speech that may be a combination of Vector quantization (VQ) and mathematician Mixture Models (GMM). A quick review of labor applied within the space of recognition victimization VQ-GMM hybrid approach is mentioned here. DOI: 10.17762/ijritcc2321-8169.15082

    Anchor model fusion for emotion recognition in speech

    Full text link
    Proceedings of Joint COST 2101 and 2102 International Conference, BioID_MultiComm 2009, Madrid (Spain)The final publication is available at Springer via http://dx.doi.org/10.1007/978-3-642-04391-8_7In this work, a novel method for system fusion in emotion recognition for speech is presented. The proposed approach, namely Anchor Model Fusion (AMF), exploits the characteristic behaviour of the scores of a speech utterance among different emotion models, by a mapping to a back-end anchor-model feature space followed by a SVM classifier. Experiments are presented in three different databases: Ahumada III, with speech obtained from real forensic cases; and SUSAS Actual and SUSAS Simulated. Results comparing AMF with a simple sum-fusion scheme after normalization show a significant performance improvement of the proposed technique for two of the three experimental set-ups, without degrading performance in the third one.This work has been financed under project TEC2006-13170-C02-01

    Learnable PINs: Cross-Modal Embeddings for Person Identity

    Full text link
    We propose and investigate an identity sensitive joint embedding of face and voice. Such an embedding enables cross-modal retrieval from voice to face and from face to voice. We make the following four contributions: first, we show that the embedding can be learnt from videos of talking faces, without requiring any identity labels, using a form of cross-modal self-supervision; second, we develop a curriculum learning schedule for hard negative mining targeted to this task, that is essential for learning to proceed successfully; third, we demonstrate and evaluate cross-modal retrieval for identities unseen and unheard during training over a number of scenarios and establish a benchmark for this novel task; finally, we show an application of using the joint embedding for automatically retrieving and labelling characters in TV dramas.Comment: To appear in ECCV 201

    Prerequisites for Affective Signal Processing (ASP) - Part V: A response to comments and suggestions

    Get PDF
    In four papers, a set of eleven prerequisites for affective signal processing (ASP) were identified (van den Broek et al., 2010): validation, triangulation, a physiology-driven approach, contributions of the signal processing community, identification of users, theoretical specification, integration of biosignals, physical characteristics, historical perspective, temporal construction, and real-world baselines. Additionally, a review (in two parts) of affective computing was provided. Initiated by the reactions on these four papers, we now present: i) an extension of the review, ii) a post-hoc analysis based on the eleven prerequisites of Picard et al.(2001), and iii) a more detailed discussion and illustrations of temporal aspects with ASP
    corecore