Search CORE

16,586 research outputs found

Pre-Training Strategies Using Contrastive Learning and Playlist Information for Music Classification and Similarity

Author: Alonso-Jiménez Pablo
Bogdanov Dmitry
Bourdalas Grigoris
Favory Xavier
Foroughmand Hadrien
Lidy Thomas
Serra Xavier
Publication venue
Publication date: 24/04/2023
Field of study

In this work, we investigate an approach that relies on contrastive learning and music metadata as a weak source of supervision to train music representation models. Recent studies show that contrastive learning can be used with editorial metadata (e.g., artist or album name) to learn audio representations that are useful for different classification tasks. In this paper, we extend this idea to using playlist data as a source of music similarity information and investigate three approaches to generate anchor and positive track pairs. We evaluate these approaches by fine-tuning the pre-trained models for music multi-label classification tasks (genre, mood, and instrument tagging) and music similarity. We find that creating anchor and positive track pairs by relying on co-occurrences in playlists provides better music similarity and competitive classification results compared to choosing tracks from the same artist as in previous works. Additionally, our best pre-training approach based on playlists provides superior classification performance for most datasets.Comment: Accepted at the 2023 International Conference on Acoustics, Speech, and Signal Processing (ICASSP'23

arXiv.org e-Print Archive

Large-Scale User Modeling with Recurrent Neural Networks for Music Discovery on Multiple Time Scales

Author: Agrawal Rohan
Chen Ching-Wei
De Boom Cedric
Demeester Thomas
Dhoedt Bart
Hansen Samantha
Kumar Esh
Yon Romain
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 22/08/2017
Field of study

The amount of content on online music streaming platforms is immense, and most users only access a tiny fraction of this content. Recommender systems are the application of choice to open up the collection to these users. Collaborative filtering has the disadvantage that it relies on explicit ratings, which are often unavailable, and generally disregards the temporal nature of music consumption. On the other hand, item co-occurrence algorithms, such as the recently introduced word2vec-based recommenders, are typically left without an effective user representation. In this paper, we present a new approach to model users through recurrent neural networks by sequentially processing consumed items, represented by any type of embeddings and other context features. This way we obtain semantically rich user representations, which capture a user's musical taste over time. Our experimental analysis on large-scale user data shows that our model can be used to predict future songs a user will likely listen to, both in the short and long term.Comment: Author pre-print version, 20 pages, 6 figures, 4 table

arXiv.org e-Print Archive

Ghent University Academic Bibliography

Generating Factoid Questions With Recurrent Neural Networks: The 30M Factoid Question-Answer Corpus

Author: Ahn Sungjin
Bengio Yoshua
Chandar Sarath
Courville Aaron
García-Durán Alberto
Gulcehre Caglar
Serban Iulian Vlad
Publication venue
Publication date: 01/01/2016
Field of study

Over the past decade, large-scale supervised learning corpora have enabled machine learning researchers to make substantial advances. However, to this date, there are no large-scale question-answer corpora available. In this paper we present the 30M Factoid Question-Answer Corpus, an enormous question answer pair corpus produced by applying a novel neural network architecture on the knowledge base Freebase to transduce facts into natural language questions. The produced question answer pairs are evaluated both by human evaluators and using automatic evaluation metrics, including well-established machine translation and sentence similarity metrics. Across all evaluation criteria the question-generation model outperforms the competing template-based baseline. Furthermore, when presented to human evaluators, the generated questions appear comparable in quality to real human-generated questions.Comment: 13 pages, 1 figure, 7 table

arXiv.org e-Print Archive

Crossref

PolyPublie

Sequential Complexity as a Descriptor for Musical Similarity

Author: Dixon S
Foster P
Mauch M
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2014
Field of study

We propose string compressibility as a descriptor of temporal structure in audio, for the purpose of determining musical similarity. Our descriptors are based on computing track-wise compression rates of quantised audio features, using multiple temporal resolutions and quantisation granularities. To verify that our descriptors capture musically relevant information, we incorporate our descriptors into similarity rating prediction and song year prediction tasks. We base our evaluation on a dataset of 15500 track excerpts of Western popular music, for which we obtain 7800 web-sourced pairwise similarity ratings. To assess the agreement among similarity ratings, we perform an evaluation under controlled conditions, obtaining a rank correlation of 0.33 between intersected sets of ratings. Combined with bag-of-features descriptors, we obtain performance gains of 31.1% and 10.9% for similarity rating prediction and song year prediction. For both tasks, analysis of selected descriptors reveals that representing features at multiple time scales benefits prediction accuracy.Comment: 13 pages, 9 figures, 8 tables. Accepted versio

arXiv.org e-Print Archive

CiteSeerX

Crossref

Queen Mary Research Online