3,414 research outputs found

    Feature extraction based on bio-inspired model for robust emotion recognition

    Get PDF
    Emotional state identification is an important issue to achieve more natural speech interactive systems. Ideally, these systems should also be able to work in real environments in which generally exist some kind of noise. Several bio-inspired representations have been applied to artificial systems for speech processing under noise conditions. In this work, an auditory signal representation is used to obtain a novel bio-inspired set of features for emotional speech signals. These characteristics, together with other spectral and prosodic features, are used for emotion recognition under noise conditions. Neural models were trained as classifiers and results were compared to the well-known mel-frequency cepstral coefficients. Results show that using the proposed representations, it is possible to significantly improve the robustness of an emotion recognition system. The results were also validated in a speaker independent scheme and with two emotional speech corpora.Fil: Albornoz, Enrique Marcelo. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hídricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; ArgentinaFil: Milone, Diego Humberto. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hídricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; ArgentinaFil: Rufiner, Hugo Leonardo. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hídricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; Argentin

    Multimodal Speech Emotion Recognition Using Audio and Text

    Full text link
    Speech emotion recognition is a challenging task, and extensive reliance has been placed on models that use audio features in building well-performing classifiers. In this paper, we propose a novel deep dual recurrent encoder model that utilizes text data and audio signals simultaneously to obtain a better understanding of speech data. As emotional dialogue is composed of sound and spoken content, our model encodes the information from audio and text sequences using dual recurrent neural networks (RNNs) and then combines the information from these sources to predict the emotion class. This architecture analyzes speech data from the signal level to the language level, and it thus utilizes the information within the data more comprehensively than models that focus on audio features. Extensive experiments are conducted to investigate the efficacy and properties of the proposed model. Our proposed model outperforms previous state-of-the-art methods in assigning data to one of four emotion categories (i.e., angry, happy, sad and neutral) when the model is applied to the IEMOCAP dataset, as reflected by accuracies ranging from 68.8% to 71.8%.Comment: 7 pages, Accepted as a conference paper at IEEE SLT 201

    Emotion Recognition via Continuous Mandarin Speech

    Get PDF

    Automatic Emotion Recognition from Mandarin Speech

    Get PDF

    EEMD-Based Speaker Automatic Emotional Recognition in Chinese Mandarin

    Full text link

    Gender dependent word-level emotion detection using global spectral speech features

    Get PDF
    In this study, global spectral features extracted from word and sentence levels are studied for speech emotion recognition. MFCC (Mel Frequency Cepstral Coefficient) were used as spectral information for recognition purpose. Global spectral features representing gross statistics such as mean of MFCC are used. This study also examine words at different positions (initial, middle and end) separately in a sentence. Word-level feature extraction is used to analyze emotion recognition performance of words at different positions. Word boundaries are manually identified. Gender dependent and independent models are also studied to analyze the gender impact on emotion recognition performance. Berlin’s Emo-DB (Emotional Database) was used for emotional speech dataset. Performance of different classifiers also been studied. NN (Neural Network), KNN (K-Nearest Neighbor) and LDA (Linear Discriminant Analysis) are included in the classifiers. Anger and neutral emotions were also studied. Results showed that, using all 13 MFCC coefficients provide better classification results than other combinations of MFCC coefficients for the mentioned emotions. Words at initial and ending positions provide more emotion, specific information than words at middle position. Gender dependent models are more efficient than gender independent models. Moreover, female are more efficient than male model and female exhibit emotions better than the male. General, NN performs the worst compared to KNN and LDA in classifying anger and neutral. LDA performs better than KNN almost 15% for gender independent model and almost 25% for gender dependent
    corecore