Search CORE

12,858 research outputs found

One Deep Music Representation to Rule Them All? : A comparative analysis of different representation learning strategies

Author: Hanjalic Alan
Kim Jaehun
Liem Cynthia C. S.
Urbano Julián
Publication venue
Publication date: 01/01/2019
Field of study

Inspired by the success of deploying deep learning in the fields of Computer Vision and Natural Language Processing, this learning paradigm has also found its way into the field of Music Information Retrieval. In order to benefit from deep learning in an effective, but also efficient manner, deep transfer learning has become a common approach. In this approach, it is possible to reuse the output of a pre-trained neural network as the basis for a new learning task. The underlying hypothesis is that if the initial and new learning tasks show commonalities and are applied to the same type of input data (e.g. music audio), the generated deep representation of the data is also informative for the new task. Since, however, most of the networks used to generate deep representations are trained using a single initial learning source, their representation is unlikely to be informative for all possible future tasks. In this paper, we present the results of our investigation of what are the most important factors to generate deep representations for the data and learning tasks in the music domain. We conducted this investigation via an extensive empirical study that involves multiple learning sources, as well as multiple deep learning architectures with varying levels of information sharing between sources, in order to learn music representations. We then validate these representations considering multiple target datasets for evaluation. The results of our experiments yield several insights on how to approach the design of methods for learning widely deployable deep data representations in the music domain.Comment: This work has been accepted to "Neural Computing and Applications: Special Issue on Deep Learning for Music and Audio

arXiv.org e-Print Archive

TU Delft Repository

Pre-Training Strategies Using Contrastive Learning and Playlist Information for Music Classification and Similarity

Author: Alonso-Jiménez Pablo
Bogdanov Dmitry
Bourdalas Grigoris
Favory Xavier
Foroughmand Hadrien
Lidy Thomas
Serra Xavier
Publication venue
Publication date: 24/04/2023
Field of study

In this work, we investigate an approach that relies on contrastive learning and music metadata as a weak source of supervision to train music representation models. Recent studies show that contrastive learning can be used with editorial metadata (e.g., artist or album name) to learn audio representations that are useful for different classification tasks. In this paper, we extend this idea to using playlist data as a source of music similarity information and investigate three approaches to generate anchor and positive track pairs. We evaluate these approaches by fine-tuning the pre-trained models for music multi-label classification tasks (genre, mood, and instrument tagging) and music similarity. We find that creating anchor and positive track pairs by relying on co-occurrences in playlists provides better music similarity and competitive classification results compared to choosing tracks from the same artist as in previous works. Additionally, our best pre-training approach based on playlists provides superior classification performance for most datasets.Comment: Accepted at the 2023 International Conference on Acoustics, Speech, and Signal Processing (ICASSP'23

arXiv.org e-Print Archive

Efficient Supervised Training of Audio Transformers for Music Representation Learning

Author: Alonso-Jiménez Pablo
Bogdanov Dmitry
Serra Xavier
Publication venue
Publication date: 28/09/2023
Field of study

In this work, we address music representation learning using convolution-free transformers. We build on top of existing spectrogram-based audio transformers such as AST and train our models on a supervised task using patchout training similar to PaSST. In contrast to previous works, we study how specific design decisions affect downstream music tagging tasks instead of focusing on the training task. We assess the impact of initializing the models with different pre-trained weights, using various input audio segment lengths, using learned representations from different blocks and tokens of the transformer for downstream tasks, and applying patchout at inference to speed up feature extraction. We find that 1) initializing the model from ImageNet or AudioSet weights and using longer input segments are beneficial both for the training and downstream tasks, 2) the best representations for the considered downstream tasks are located in the middle blocks of the transformer, and 3) using patchout at inference allows faster processing than our convolutional baselines while maintaining superior performance. The resulting models, MAEST, are publicly available and obtain the best performance among open models in music tagging tasks.Comment: Accepted at the 2023 International Society for Music Information Retrieval Conference (ISMIR'23

arXiv.org e-Print Archive

Feature Extraction for Music Information Retrieval

Author: Jensen Jesper Højvang
Publication venue: Multimedia Information and Signal Processing, Institute of Electronic Systems, Aalborg University
Publication date: 01/01/2009
Field of study

CiteSeerX

VBN

Machine Learning Analysis of the Cultural and Cross-Cultural Aspects of Beauty in Music

Author: Q Claire Elizabeth
Publication venue
Publication date: 21/05/2013
Field of study

Aberystwyth Research Portal

Learning feature hierarchies for musical audio signals

Author: Dieleman Sander
Publication venue: Ghent University. Faculty of Engineering and Architecture
Publication date: 01/01/2015
Field of study

Ghent University Academic Bibliography

Explainability in Music Recommender Systems

Author: Afchar Darius
Epure Elena V.
Hennequin Romain
Melchiorre Alessandro B.
Moussallam Manuel
Schedl Markus
Publication venue: 'Wiley'
Publication date: 25/01/2022
Field of study

The most common way to listen to recorded music nowadays is via streaming platforms which provide access to tens of millions of tracks. To assist users in effectively browsing these large catalogs, the integration of Music Recommender Systems (MRSs) has become essential. Current real-world MRSs are often quite complex and optimized for recommendation accuracy. They combine several building blocks based on collaborative filtering and content-based recommendation. This complexity can hinder the ability to explain recommendations to end users, which is particularly important for recommendations perceived as unexpected or inappropriate. While pure recommendation performance often correlates with user satisfaction, explainability has a positive impact on other factors such as trust and forgiveness, which are ultimately essential to maintain user loyalty. In this article, we discuss how explainability can be addressed in the context of MRSs. We provide perspectives on how explainability could improve music recommendation algorithms and enhance user experience. First, we review common dimensions and goals of recommenders' explainability and in general of eXplainable Artificial Intelligence (XAI), and elaborate on the extent to which these apply -- or need to be adapted -- to the specific characteristics of music consumption and recommendation. Then, we show how explainability components can be integrated within a MRS and in what form explanations can be provided. Since the evaluation of explanation quality is decoupled from pure accuracy-based evaluation criteria, we also discuss requirements and strategies for evaluating explanations of music recommendations. Finally, we describe the current challenges for introducing explainability within a large-scale industrial music recommender system and provide research perspectives.Comment: To appear in AI Magazine, Special Topic on Recommender Systems 202

arXiv.org e-Print Archive