5,266 research outputs found
Exploring Spatio-Temporal Representations by Integrating Attention-based Bidirectional-LSTM-RNNs and FCNs for Speech Emotion Recognition
Automatic emotion recognition from speech, which is an important and challenging task in the field of affective computing, heavily relies on the effectiveness of the speech features for classification. Previous approaches to emotion recognition have mostly focused on the extraction of carefully hand-crafted features. How to model spatio-temporal dynamics for speech emotion recognition effectively is still under active investigation. In this paper, we propose a method to tackle the problem of emotional relevant feature extraction from speech by leveraging Attention-based Bidirectional Long Short-Term Memory Recurrent Neural Networks with fully convolutional networks in order to automatically learn the best spatio-temporal representations of speech signals. The learned high-level features are then fed into a deep neural network (DNN) to predict the final emotion. The experimental results on the Chinese Natural Audio-Visual Emotion Database (CHEAVD) and the Interactive Emotional Dyadic Motion Capture (IEMOCAP) corpora show that our method provides more accurate predictions compared with other existing emotion recognition algorithms
Evaluating raw waveforms with deep learning frameworks for speech emotion recognition
Speech emotion recognition is a challenging task in speech processing field.
For this reason, feature extraction process has a crucial importance to
demonstrate and process the speech signals. In this work, we represent a model,
which feeds raw audio files directly into the deep neural networks without any
feature extraction stage for the recognition of emotions utilizing six
different data sets, EMO-DB, RAVDESS, TESS, CREMA, SAVEE, and TESS+RAVDESS. To
demonstrate the contribution of proposed model, the performance of traditional
feature extraction techniques namely, mel-scale spectogram, mel-frequency
cepstral coefficients, are blended with machine learning algorithms, ensemble
learning methods, deep and hybrid deep learning techniques. Support vector
machine, decision tree, naive Bayes, random forests models are evaluated as
machine learning algorithms while majority voting and stacking methods are
assessed as ensemble learning techniques. Moreover, convolutional neural
networks, long short-term memory networks, and hybrid CNN- LSTM model are
evaluated as deep learning techniques and compared with machine learning and
ensemble learning methods. To demonstrate the effectiveness of proposed model,
the comparison with state-of-the-art studies are carried out. Based on the
experiment results, CNN model excels existent approaches with 95.86% of
accuracy for TESS+RAVDESS data set using raw audio files, thence determining
the new state-of-the-art. The proposed model performs 90.34% of accuracy for
EMO-DB with CNN model, 90.42% of accuracy for RAVDESS with CNN model, 99.48% of
accuracy for TESS with LSTM model, 69.72% of accuracy for CREMA with CNN model,
85.76% of accuracy for SAVEE with CNN model in speaker-independent audio
categorization problems.Comment: 14 pages, 6 Figures, 8 Table
Convolutional RNN: an Enhanced Model for Extracting Features from Sequential Data
Traditional convolutional layers extract features from patches of data by
applying a non-linearity on an affine function of the input. We propose a model
that enhances this feature extraction process for the case of sequential data,
by feeding patches of the data into a recurrent neural network and using the
outputs or hidden states of the recurrent units to compute the extracted
features. By doing so, we exploit the fact that a window containing a few
frames of the sequential data is a sequence itself and this additional
structure might encapsulate valuable information. In addition, we allow for
more steps of computation in the feature extraction process, which is
potentially beneficial as an affine function followed by a non-linearity can
result in too simple features. Using our convolutional recurrent layers we
obtain an improvement in performance in two audio classification tasks,
compared to traditional convolutional layers. Tensorflow code for the
convolutional recurrent layers is publicly available in
https://github.com/cruvadom/Convolutional-RNN
- …