4,707 research outputs found
Speech Emotion Recognition Using Multi-hop Attention Mechanism
In this paper, we are interested in exploiting textual and acoustic data of
an utterance for the speech emotion classification task. The baseline approach
models the information from audio and text independently using two deep neural
networks (DNNs). The outputs from both the DNNs are then fused for
classification. As opposed to using knowledge from both the modalities
separately, we propose a framework to exploit acoustic information in tandem
with lexical data. The proposed framework uses two bi-directional long
short-term memory (BLSTM) for obtaining hidden representations of the
utterance. Furthermore, we propose an attention mechanism, referred to as the
multi-hop, which is trained to automatically infer the correlation between the
modalities. The multi-hop attention first computes the relevant segments of the
textual data corresponding to the audio signal. The relevant textual data is
then applied to attend parts of the audio signal. To evaluate the performance
of the proposed system, experiments are performed in the IEMOCAP dataset.
Experimental results show that the proposed technique outperforms the
state-of-the-art system by 6.5% relative improvement in terms of weighted
accuracy.Comment: 5 pages, Accepted as a conference paper at ICASSP 2019 (oral
presentation
Learning Representations of Emotional Speech with Deep Convolutional Generative Adversarial Networks
Automatically assessing emotional valence in human speech has historically
been a difficult task for machine learning algorithms. The subtle changes in
the voice of the speaker that are indicative of positive or negative emotional
states are often "overshadowed" by voice characteristics relating to emotional
intensity or emotional activation. In this work we explore a representation
learning approach that automatically derives discriminative representations of
emotional speech. In particular, we investigate two machine learning strategies
to improve classifier performance: (1) utilization of unlabeled data using a
deep convolutional generative adversarial network (DCGAN), and (2) multitask
learning. Within our extensive experiments we leverage a multitask annotated
emotional corpus as well as a large unlabeled meeting corpus (around 100
hours). Our speaker-independent classification experiments show that in
particular the use of unlabeled data in our investigations improves performance
of the classifiers and both fully supervised baseline approaches are
outperformed considerably. We improve the classification of emotional valence
on a discrete 5-point scale to 43.88% and on a 3-point scale to 49.80%, which
is competitive to state-of-the-art performance
Speaker-independent emotion recognition exploiting a psychologically-inspired binary cascade classification schema
In this paper, a psychologically-inspired binary cascade classification schema is proposed for speech emotion recognition. Performance is enhanced because commonly confused pairs of emotions are distinguishable from one another. Extracted features are related to statistics of pitch, formants, and energy contours, as well as spectrum, cepstrum, perceptual and temporal features, autocorrelation, MPEG-7 descriptors, Fujisakis model parameters, voice quality, jitter, and shimmer. Selected features are fed as input to K nearest neighborhood classifier and to support vector machines. Two kernels are tested for the latter: Linear and Gaussian radial basis function. The recently proposed speaker-independent experimental protocol is tested on the Berlin emotional speech database for each gender separately. The best emotion recognition accuracy, achieved by support vector machines with linear kernel, equals 87.7%, outperforming state-of-the-art approaches. Statistical analysis is first carried out with respect to the classifiers error rates and then to evaluate the information expressed by the classifiers confusion matrices. © Springer Science+Business Media, LLC 2011
Multi-Classifier Interactive Learning for Ambiguous Speech Emotion Recognition
In recent years, speech emotion recognition technology is of great
significance in industrial applications such as call centers, social robots and
health care. The combination of speech recognition and speech emotion
recognition can improve the feedback efficiency and the quality of service.
Thus, the speech emotion recognition has been attracted much attention in both
industry and academic. Since emotions existing in an entire utterance may have
varied probabilities, speech emotion is likely to be ambiguous, which poses
great challenges to recognition tasks. However, previous studies commonly
assigned a single-label or multi-label to each utterance in certain. Therefore,
their algorithms result in low accuracies because of the inappropriate
representation. Inspired by the optimally interacting theory, we address the
ambiguous speech emotions by proposing a novel multi-classifier interactive
learning (MCIL) method. In MCIL, multiple different classifiers first mimic
several individuals, who have inconsistent cognitions of ambiguous emotions,
and construct new ambiguous labels (the emotion probability distribution).
Then, they are retrained with the new labels to interact with their cognitions.
This procedure enables each classifier to learn better representations of
ambiguous data from others, and further improves the recognition ability. The
experiments on three benchmark corpora (MAS, IEMOCAP, and FAU-AIBO) demonstrate
that MCIL does not only improve each classifier's performance, but also raises
their recognition consistency from moderate to substantial.Comment: 10 pages, 4 figure
Post-training discriminative pruning for RBMs
One of the major challenges in the area of artificial neural networks is the identification of a suitable architecture for a specific problem. Choosing an unsuitable topology can exponentially increase the training cost, and even hinder network convergence. On the other hand, recent research indicates that larger or deeper nets can map the problem features into a more appropriate space, and thereby improve the classification process, thus leading to an apparent dichotomy. In this regard, it is interesting to inquire whether independent measures, such as mutual information, could provide a clue to finding the most discriminative neurons in a network. In the present work we explore this question in the context of Restricted Boltzmann Machines, by employing different measures to realize post-training pruning. The neurons which are determined by each measure to be the most discriminative, are combined and a classifier is applied to the ensuing network to determine its usefulness. We find that two measures in particular seem to be good indicators of the most discriminative neurons, producing savings of generally more than 50% of the neurons, while maintaining an acceptable error rate. Further, it is borne out that starting with a larger network architecture and then pruning is more advantageous than using a smaller network to begin with. Finally, a quantitative index is introduced which can provide information on choosing a suitable pruned network.Fil: Sánchez Gutiérrez, Máximo. Universidad Autónoma Metropolitana; MéxicoFil: Albornoz, Enrique Marcelo. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hídricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; ArgentinaFil: Rufiner, Hugo Leonardo. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hídricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; Argentina. Universidad Nacional de Entre Ríos; ArgentinaFil: Close, John Goddard. Universidad Autónoma Metropolitana; Méxic
- …