Emotion and themes recognition in music with convolutional and recurrent attention-blocks

Abstract

Emotion is an essential aspect of music, and its recognition is a prevalent research topic in the field of computer audition. Machine learning-based Music Emotion Recognition ( MER) systems could boost the accessibility of music collections by providing standardised methodologies of music categorisation. In this paper, we introduce our (team name: AugsBurger) machine learning architecture sequentially composed of a convolutional feature extractor with block attention modules and a recurrent stack with self-attention for automatic MER. We train 5 models and conduct various late fusion experiments. Utilising a Convolutional Recurrent Neural Network ( CRNN ) with convolutional block attention applied throughout a 18-layer ResNet and a single recurrent layer with a Gated Recurrent Unit cell, a ROC-AUC of 73.9 % can be achieved on the test partition of the MediaEval 2020 Emotion & Themes in Music task. Applying late fusion on the individual model predictions and another challenge submission, this result is further increased to 75.3 % ROC-AUC

    Similar works