Search CORE

5 research outputs found

Stacked Convolutional and Recurrent Neural Networks for Music Emotion Recognition

Author: Adavanne Sharath
Drossos Konstantinos
Jarina Roman
Malik Miroslav
Ticha Dasa
Virtanen Tuomas
Publication venue
Publication date: 01/01/2017
Field of study

This paper studies the emotion recognition from musical tracks in the 2-dimensional valence-arousal (V-A) emotional space. We propose a method based on convolutional (CNN) and recurrent neural networks (RNN), having significantly fewer parameters compared with the state-of-the-art method for the same task. We utilize one CNN layer followed by two branches of RNNs trained separately for arousal and valence. The method was evaluated using the 'MediaEval2015 emotion in music' dataset. We achieved an RMSE of 0.202 for arousal and 0.268 for valence, which is the best result reported on this dataset.Comment: Accepted for Sound and Music Computing (SMC 2017

arXiv.org e-Print Archive

Trepo - Institutional Repository of Tampere University

Shared acoustic codes underlie emotional communication in music and speech—Evidence from deep transfer learning

Author: Coutinho E
Schuller B
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2017
Field of study

Music and speech exhibit striking similarities in the communication of emotions in the acoustic domain, in such a way that the communication of specific emotions is achieved, at least to a certain extent, by means of shared acoustic patterns. From an Affective Sciences points of view, determining the degree of overlap between both domains is fundamental to understand the shared mechanisms underlying such phenomenon. From a Machine learning perspective, the overlap between acoustic codes for emotional expression in music and speech opens new possibilities to enlarge the amount of data available to develop music and speech emotion recognition systems. In this article, we investigate time-continuous predictions of emotion (Arousal and Valence) in music and speech, and the Transfer Learning between these domains. We establish a comparative framework including intra- (i.e., models trained and tested on the same modality, either music or speech) and cross-domain experiments (i.e., models trained in one modality and tested on the other). In the cross-domain context, we evaluated two strategies—the direct transfer between domains, and the contribution of Transfer Learning techniques (feature-representation-transfer based on Denoising Auto Encoders) for reducing the gap in the feature space distributions. Our results demonstrate an excellent cross-domain generalisation performance with and without feature representation transfer in both directions. In the case of music, cross-domain approaches outperformed intra-domain models for Valence estimation, whereas for Speech intra-domain models achieve the best performance. This is the first demonstration of shared acoustic codes for emotional expression in music and speech in the time-continuous domain

University of Liverpool Repository

Crossref

Directory of Open Access Journals

Modelling affect for horror soundscapes

Author: Liapis Antonios
Lopes Phil
Yannakakis Georgios N.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2017
Field of study

The feeling of horror within movies or games relies on the audience’s perception of a tense atmosphere — often achieved through sound accompanied by the on-screen drama — guiding its emotional experience throughout the scene or game-play sequence. These progressions are often crafted through an a priori knowledge of how a scene or game-play sequence will playout, and the intended emotional patterns a game director wants to transmit. The appropriate design of sound becomes even more challenging once the scenery and the general context is autonomously generated by an algorithm. Towards realizing sound-based affective interaction in games this paper explores the creation of computational models capable of ranking short audio pieces based on crowdsourced annotations of tension, arousal and valence. Affect models are trained via preference learning on over a thousand annotations with the use of support vector machines, whose inputs are low-level features extracted from the audio assets of a comprehensive sound library. The models constructed in this work are able to predict the tension, arousal and valence elicited by sound, respectively, with an accuracy of approximately 65%, 66% and 72%.peer-reviewe

OAR@UM

Crossref