41 research outputs found
Apprentissage statistique pour l'étiquetage de musique et la recommandation
Mémoire numérisé par la Division de la gestion de documents et des archives de l'Université de Montréal
Large-Scale Pattern Discovery in Music
This work focuses on extracting patterns in musical data from very large collections. The problem is split in two parts. First, we build such a large collection, the Million Song Dataset, to provide researchers access to commercial-size datasets. Second, we use this collection to study cover song recognition which involves finding harmonic patterns from audio features. Regarding the Million Song Dataset, we detail how we built the original collection from an online API, and how we encouraged other organizations to participate in the project. The result is the largest research dataset with heterogeneous sources of data available to music technology researchers. We demonstrate some of its potential and discuss the impact it already has on the field. On cover song recognition, we must revisit the existing literature since there are no publicly available results on a dataset of more than a few thousand entries. We present two solutions to tackle the problem, one using a hashing method, and one using a higher-level feature computed from the chromagram (dubbed the 2DFTM). We further investigate the 2DFTM since it has potential to be a relevant representation for any task involving audio harmonic content. Finally, we discuss the future of the dataset and the hope of seeing more work making use of the different sources of data that are linked in the Million Song Dataset. Regarding cover songs, we explain how this might be a first step towards defining a harmonic manifold of music, a space where harmonic similarities between songs would be more apparent
Recommended from our members
Mining Large-Scale Music Data Sets
Large collections of music audio are now common and present an interesting research opportunity: what statistical patterns and structure can be discovered across thousands or millions of examples? Unfortunately, copyright restrictions can interfere with access to such collections, so we have developed the Million Song Dataset, including derived features but not the original audio, to support commercial-scale music analysis on a common, research database. The audio features are augmented by a wide range of metadata including lyrics, tags, and listener playcounts. Now the database is ready, we have begun analyzing the content, including tasks such as identifying cover songs -- significantly harder for such a large collection
Recommended from our members
Large-Scale Cover Song Recognition Using the 2D Fourier Transform Magnitude
Large-scale cover song recognition involves calculating item-to-item similarities that can accommodate differences in timing and tempo, rendering simple Euclidean measures unsuitable. Expensive solutions such as dynamic time warping do not scale to million of instances, making them inappropriate for commercial-scale applications. In this work, we transform a beat-synchronous chroma matrix with a 2D Fourier transform and show that the resulting representation has properties that fit the cover song recognition task. We can also apply PCA to efficiently scale comparisons. We report the best results to date on the largest available dataset of around 18,000 cover songs amid one million tracks, giving a mean average precision of 3.0%
Scalable k-Means Clustering via Lightweight Coresets
Coresets are compact representations of data sets such that models trained on
a coreset are provably competitive with models trained on the full data set. As
such, they have been successfully used to scale up clustering models to massive
data sets. While existing approaches generally only allow for multiplicative
approximation errors, we propose a novel notion of lightweight coresets that
allows for both multiplicative and additive errors. We provide a single
algorithm to construct lightweight coresets for k-means clustering as well as
soft and hard Bregman clustering. The algorithm is substantially faster than
existing constructions, embarrassingly parallel, and the resulting coresets are
smaller. We further show that the proposed approach naturally generalizes to
statistical k-means clustering and that, compared to existing results, it can
be used to compute smaller summaries for empirical risk minimization. In
extensive experiments, we demonstrate that the proposed algorithm outperforms
existing data summarization strategies in practice.Comment: To appear in the 24th ACM SIGKDD International Conference on
Knowledge Discovery & Data Mining (KDD
Automatic generation of social tags for music recommendation
Abstract Social tags are user-generated keywords associated with some resource on the Web. In the case of music, social tags have become an important component of "Web2.0" recommender systems, allowing users to generate playlists based on use-dependent terms such as chill or jogging that have been applied to particular songs. In this paper, we propose a method for predicting these social tags directly from MP3 files. Using a set of boosted classifiers, we map audio features onto social tags collected from the Web. The resulting automatic tags (or autotags) furnish information about music that is otherwise untagged or poorly tagged, allowing for insertion of previously unheard music into a social recommender. This avoids the "cold-start problem" common in such systems. Autotags can also be used to smooth the tag space from which similarities and recommendations are made by providing a set of comparable baseline tags for all tracks in a recommender system
Clustering beat-chroma patterns in a large music database
A musical style or genre implies a set of common conventions and patterns combined and deployed in different ways to make individual musical pieces; for instance, most would agree that contemporary pop music is assembled from a relatively small palette of harmonic and melodic patterns. The purpose of this paper is to use a database of tens of thousands of songs in combination with a compact representation of melodic-harmonic content (the beat-synchronous chromagram) and data-mining tools (clustering) to attempt to explicitly catalog this palette — at least within the limitations of the beat-chroma representation. We use online k-means clustering to summarize 3.7 million 4-beat bars in a codebook of a few hundred prototypes. By measuring how accurately such a quantized codebook can reconstruct the original data, we can quantify the degree of diversity (distortion as a function of codebook size) and temporal structure (i.e. the advantage gained by joint quantizing multiple frames) in this music. The most popular codewords themselves reveal the common chords used in the music. Finally, the quantized representation of music can be used for music retrieval tasks such as artist and genre classification, and identifying songs that are similar in terms of their melodic-harmonic content
The Million Song Dataset
We introduce the Million Song Dataset, a freely-available collection of audio features and metadata for a million contemporary popular music tracks. We describe its creation process, its content, and its possible uses. Attractive features of the Million Song Database include the range of existing resources to which it is linked, and the fact that it is the largest current research dataset in our field. As an illustration, we present year prediction as an example application, a task that has, until now, been difficult to study owing to the absence of a large set of suitable data. We show positive results on year prediction, and discuss more generally the future development of the dataset
Accurate, Fast and Scalable Kernel Ridge Regression on Parallel and Distributed Systems
We propose two new methods to address the weak scaling problems of KRR: the
Balanced KRR (BKRR) and K-means KRR (KKRR). These methods consider alternative
ways to partition the input dataset into p different parts, generating p
different models, and then selecting the best model among them. Compared to a
conventional implementation, KKRR2 (optimized version of KKRR) improves the
weak scaling efficiency from 0.32% to 38% and achieves a 591times speedup for
getting the same accuracy by using the same data and the same hardware (1536
processors). BKRR2 (optimized version of BKRR) achieves a higher accuracy than
the current fastest method using less training time for a variety of datasets.
For the applications requiring only approximate solutions, BKRR2 improves the
weak scaling efficiency to 92% and achieves 3505 times speedup (theoretical
speedup: 4096 times).Comment: This paper has been accepted by ACM International Conference on
Supercomputing (ICS) 201
Mining oral history collections using music information retrieval methods
Recent work at the Sussex Humanities Lab, a digital humanities research program at the University of Sussex, has sought to address an identified gap in the provision and use of audio feature analysis for spoken word collections. Traditionally, oral history methodologies and practices have placed emphasis on working with transcribed textual surrogates, rather than the digital audio files created during the interview process. This provides a pragmatic access to the basic semantic content, but obviates access to other potentially meaningful aural information; our work addresses the potential for methods to explore this extra-semantic information, by working with the audio directly. Audio analysis tools, such as those developed within the established field of Music Information Retrieval (MIR), provide this opportunity. This paper describes the application of audio analysis techniques and methods to spoken word collections. We demonstrate an approach using freely available audio and data analysis tools, which have been explored and evaluated in two workshops. We hope to inspire new forms of content analysis which complement semantic analysis with investigation into the more nuanced properties carried in audio signals