Search CORE

2,864 research outputs found

Stacked Convolutional and Recurrent Neural Networks for Music Emotion Recognition

Author: Adavanne Sharath
Drossos Konstantinos
Jarina Roman
Malik Miroslav
Ticha Dasa
Virtanen Tuomas
Publication venue
Publication date: 01/01/2017
Field of study

This paper studies the emotion recognition from musical tracks in the 2-dimensional valence-arousal (V-A) emotional space. We propose a method based on convolutional (CNN) and recurrent neural networks (RNN), having significantly fewer parameters compared with the state-of-the-art method for the same task. We utilize one CNN layer followed by two branches of RNNs trained separately for arousal and valence. The method was evaluated using the 'MediaEval2015 emotion in music' dataset. We achieved an RMSE of 0.202 for arousal and 0.268 for valence, which is the best result reported on this dataset.Comment: Accepted for Sound and Music Computing (SMC 2017

arXiv.org e-Print Archive

Trepo - Institutional Repository of Tampere University

Deep Learning for Environmentally Robust Speech Recognition: An Overview of Recent Developments

Author: Geiger Jürgen
Jin Wenyu
Mousa Amr El-Desoky
Pohjalainen Jouni
Schuller Björn
Zhang Zixing
Publication venue
Publication date: 01/01/2018
Field of study

Eliminating the negative effect of non-stationary environmental noise is a long-standing research topic for automatic speech recognition that stills remains an important challenge. Data-driven supervised approaches, including ones based on deep neural networks, have recently emerged as potential alternatives to traditional unsupervised approaches and with sufficient training, can alleviate the shortcomings of the unsupervised methods in various real-life acoustic environments. In this light, we review recently developed, representative deep learning approaches for tackling non-stationary additive and convolutional degradation of speech with the aim of providing guidelines for those involved in the development of environmentally robust speech recognition systems. We separately discuss single- and multi-channel techniques developed for the front-end and back-end of speech recognition systems, as well as joint front-end and back-end training frameworks

arXiv.org e-Print Archive

OPUS Augsburg

Crossref

Dynamic difficulty awareness training for continuous emotion prediction

Author: Coutinho E
Han J
Schuller B
Zhang Z
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 02/09/2018
Field of study

Time-continuous emotion prediction has become an increasingly compelling task in machine learning. Considerable efforts have been made to advance the performance of these systems. Nonetheless, the main focus has been the development of more sophisticated models and the incorporation of different expressive modalities (e.g., speech, face, and physiology). In this paper, motivated by the benefit of difficulty awareness in a human learning procedure, we propose a novel machine learning framework, namely, Dynamic Difficulty Awareness Training (DDAT), which sheds fresh light on the research - directly exploiting the difficulties in learning to boost the machine learning process. The DDAT framework consists of two stages: information retrieval and information exploitation. In the first stage, we make use of the reconstruction error of input features or the annotation uncertainty to estimate the difficulty of learning specific information. The obtained difficulty level is then used in tandem with original features to update the model input in a second learning stage with the expectation that the model can learn to focus on high difficulty regions of the learning process. We perform extensive experiments on a benchmark database (RECOLA) to evaluate the effectiveness of the proposed framework. The experimental results show that our approach outperforms related baselines as well as other well-established time-continuous emotion prediction systems, which suggests that dynamically integrating the difficulty information for neural networks can help enhance the learning process

arXiv.org e-Print Archive

University of Liverpool Repository

OPUS Augsburg

Spiral - Imperial College Digital Repository

Backwards is the way forward: feedback in the cortical hierarchy predicts the expected future

Author: Muckli L.
Petro L.S.
Smith F.W.
Publication venue: 'Cambridge University Press (CUP)'
Publication date: 10/05/2013
Field of study

Clark offers a powerful description of the brain as a prediction machine, which offers progress on two distinct levels. First, on an abstract conceptual level, it provides a unifying framework for perception, action, and cognition (including subdivisions such as attention, expectation, and imagination). Second, hierarchical prediction offers progress on a concrete descriptive level for testing and constraining conceptual elements and mechanisms of predictive coding models (estimation of predictions, prediction errors, and internal models)

Crossref

Enlighten

University of East Anglia digital repository

Sentiment analysis in non-fixed length audios using a Fully Convolutional Neural Network

Author: Alaiz-Moretón Héctor
Benavides Carmen
Benítez-Andrades José Alberto
García-Olalla Oscar
García-Ordás María Teresa
García-Rodríguez Isaías
Publication venue: 'Elsevier BV'
Publication date: 10/03/2022
Field of study

.In this work, a sentiment analysis method that is capable of accepting audio of any length, without being fixed a priori, is proposed. Mel spectrogram and Mel Frequency Cepstral Coefficients are used as audio description methods and a Fully Convolutional Neural Network architecture is proposed as a classifier. The results have been validated using three well known datasets: EMODB, RAVDESS and TESS. The results obtained were promising, outperforming the state-of–the-art methods. Also, thanks to the fact that the proposed method admits audios of any size, it allows a sentiment analysis to be made in near real time, which is very interesting for a wide range of fields such as call centers, medical consultations or financial brokers.S

Leon University (Spain)

Sentiment analysis in non-fixed length audios using a Fully Convolutional Neural Network

Author: Alaiz-Moretón Héctor
Benavides Carmen
Benítez-Andrades José Alberto
García-Olalla Oscar
García-Ordás María Teresa
García-Rodríguez Isaías
Publication venue
Publication date: 03/02/2024
Field of study

In this work, a sentiment analysis method that is capable of accepting audio of any length, without being fixed a priori, is proposed. Mel spectrogram and Mel Frequency Cepstral Coefficients are used as audio description methods and a Fully Convolutional Neural Network architecture is proposed as a classifier. The results have been validated using three well known datasets: EMODB, RAVDESS, and TESS. The results obtained were promising, outperforming the state-of-the-art methods. Also, thanks to the fact that the proposed method admits audios of any size, it allows a sentiment analysis to be made in near real time, which is very interesting for a wide range of fields such as call centers, medical consultations, or financial brokers

arXiv.org e-Print Archive