800 research outputs found

    Emotion recognition from speech: tools and challenges

    Get PDF
    Human emotion recognition from speech is studied frequently for its importance in many applications, e.g. human-computer interaction. There is a wide diversity and non-agreement about the basic emotion or emotion-related states on one hand and about where the emotion related information lies in the speech signal on the other side. These diversities motivate our investigations into extracting Meta-features using the PCA approach, or using a non-adaptive random projection RP, which significantly reduce the large dimensional speech feature vectors that may contain a wide range of emotion related information. Subsets of Meta-features are fused to increase the performance of the recognition model that adopts the score-based LDC classifier. We shall demonstrate that our scheme outperform the state of the art results when tested on non-prompted databases or acted databases (i.e. when subjects act specific emotions while uttering a sentence). However, the huge gap between accuracy rates achieved on the different types of datasets of speech raises questions about the way emotions modulate the speech. In particular we shall argue that emotion recognition from speech should not be dealt with as a classification problem. We shall demonstrate the presence of a spectrum of different emotions in the same speech portion especially in the non-prompted data sets, which tends to be more “natural” than the acted datasets where the subjects attempt to suppress all but one emotion. © (2015) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only

    Emotion Recognition from Speech using GMM and VQ

    Get PDF
    In this paper, there is a tendency to study the effectiveness of anchor models applied to the multiclass drawback of Emotion recognition from speech. Within the anchor models system, Associate in nursing emotion category is characterized by its line of similarity relative to different emotion categories. Generative models like Gaussian Mixture Models (GMMs) are typically used as front-end systems to get feature vectors wont to train complicated back-end systems like support vector machines (SVMs) or a multilayer perceptron (MLP) to enhance the classification performance. There is a tendency to show that within the context of extremely unbalanced knowledge categories, these back-end systems will improve the performance achieved by GMMs as long as Associate in nursing acceptable sampling or importance coefficient technique is applied. The experiments conducted on audio sample of speech; show that anchor models improve considerably the performance of GMMs by half dozen.2 % relative. There is a tendency to be employing a hybrid approach for recognizing emotion from speech that may be a combination of Vector quantization (VQ) and mathematician Mixture Models (GMM). A quick review of labor applied within the space of recognition victimization VQ-GMM hybrid approach is mentioned here. DOI: 10.17762/ijritcc2321-8169.15082

    Cooperative Learning and its Application to Emotion Recognition from Speech

    Get PDF
    In this paper, we propose a novel method for highly efficient exploitation of unlabeled data-Cooperative Learning. Our approach consists of combining Active Learning and Semi-Supervised Learning techniques, with the aim of reducing the costly effects of human annotation. The core underlying idea of Cooperative Learning is to share the labeling work between human and machine efficiently in such a way that instances predicted with insufficient confidence value are subject to human labeling, and those with high confidence values are machine labeled. We conducted various test runs on two emotion recognition tasks with a variable number of initial supervised training instances and two different feature sets. The results show that Cooperative Learning consistently outperforms individual Active and Semi-Supervised Learning techniques in all test cases. In particular, we show that our method based on the combination of Active Learning and Co-Training leads to the same performance of a model trained on the whole training set, but using 75% fewer labeled instances. Therefore, our method efficiently and robustly reduces the need for human annotations

    Multi-Time-Scale Convolution for Emotion Recognition from Speech Audio Signals

    Get PDF
    Robustness against temporal variations is important for emotion recognition from speech audio, since emotion is expressed through complex spectral patterns that can exhibit significant local dilation and compression on the time axis depending on speaker and context. To address this and potentially other tasks, we introduce the multi-time-scale (MTS) method to create flexibility towards temporal variations when analyzing time-frequency representations of audio data. MTS extends convolutional neural networks with convolution kernels that are scaled and re-sampled along the time axis, to increase temporal flexibility without increasing the number of trainable parameters compared to standard convolutional layers. We evaluate MTS and standard convolutional layers in different architectures for emotion recognition from speech audio, using 4 datasets of different sizes. The results show that the use of MTS layers consistently improves the generalization of networks of different capacity and depth, compared to standard convolution, especially on smaller datasets

    Emotion recognition from speech using representation learning in extreme learning machines

    Get PDF
    We propose the use of an Extreme Learning Machine initialised as auto-encoder for emotion recognition from speech. This method is evaluated on three different speech corpora, namely EMO-DB, eNTERFACE and SmartKom. We compare our approach against state-of-the-art recognition rates achieved by Support Vector Machines (SVMs) and a deep learning approach based on Generalised Discriminant Analysis (GerDA). We could improve the recognition rate compared to SVMs by 3%-14% on all three corpora and those compared to GerDA by 8%-13% on two of the three corpora

    Emotion recognition from speech: An implementation in MATLAB

    Get PDF
    Capstone Project submitted to the Department of Engineering, Ashesi University in partial fulfillment of the requirements for the award of Bachelor of Science degree in Electrical and Electronic Engineering, April 2019Human Computer Interaction now focuses more on being able to relate to human emotions. Recognizing human emotions from speech is an area that a lot of research is being done into with the rise of robots and Virtual reality. In this paper, emotion recognition from speech is done in MATLAB. Feature extraction is done based on the pitch and 13 MFCCs of the audio files. Two classification methods are used and compared to determine the one with the highest accuracy for the data set.Ashesi Universit
    • …
    corecore