209 research outputs found
Learning to rank music tracks using triplet loss
Most music streaming services rely on automatic recommendation algorithms to
exploit their large music catalogs. These algorithms aim at retrieving a ranked
list of music tracks based on their similarity with a target music track. In
this work, we propose a method for direct recommendation based on the audio
content without explicitly tagging the music tracks. To that aim, we propose
several strategies to perform triplet mining from ranked lists. We train a
Convolutional Neural Network to learn the similarity via triplet loss. These
different strategies are compared and validated on a large-scale experiment
against an auto-tagging based approach. The results obtained highlight the
efficiency of our system, especially when associated with an Auto-pooling
layer
Self-Similarity-Based and Novelty-based loss for music structure analysis
Music Structure Analysis (MSA) is the task aiming at identifying musical
segments that compose a music track and possibly label them based on their
similarity. In this paper we propose a supervised approach for the task of
music boundary detection. In our approach we simultaneously learn features and
convolution kernels. For this we jointly optimize -- a loss based on the
Self-Similarity-Matrix (SSM) obtained with the learned features, denoted by
SSM-loss, and -- a loss based on the novelty score obtained applying the
learned kernels to the estimated SSM, denoted by novelty-loss. We also
demonstrate that relative feature learning, through self-attention, is
beneficial for the task of MSA. Finally, we compare the performances of our
approach to previously proposed approaches on the standard RWC-Pop, and various
subsets of SALAMI
Notes from the ISMIR 2012 late-breaking session on evaluation in music information retrieval
During the last day of the ISMIR 2012 conference there were two events related to Music IR Evaluation. A panel took place during the morning to discuss several issues concerning the various evaluation initiatives with the general audience at ISMIR. A late-breaking session during the afternoon kept the discussion alive between a group of researchers who wanted to dig deeper into these issues. This extended abstract reports the main topics covered during this short session and the general thoughts that came up
Towards a (better) Definition of Annotated MIR Corpora
International audienceToday, annotated MIR corpora are provided by various re- search labs or companies, each one using its own annota- tion methodology, concept definitions, and formats. This is not an issue as such. However, the lack of descriptions of the methodology used--how the corpus was actually an- notated, and by whom--and of the annotated concepts, i.e. what is actually described, is a problem with respect to the sustainability, usability, and sharing of the corpora. Ex- perience shows that it is essential to define precisely how annotations are supplied and described. We propose here a survey and consolidation report on the nature of the an- notated corpora used and shared in MIR, with proposals for the axis against which corpora can be described so to enable effective comparison and the inherent influence this has on tasks performed using them
Degradation-Invariant Music Indexing
For music indexing robust to sound degradations and scalable for big music
catalogs, this scientific report presents an approach based on audio
descriptors relevant to the music content and invariant to sound
transformations (noise addition, distortion, lossy coding, pitch/time
transformations, or filtering e.g.). To achieve this task, one of the key point
of the proposed method is the definition of high-dimensional audio prints,
which are intrinsically (by design) robust to some sound degradations. The high
dimensionality of this first representation is then used to learn a linear
projection to a sub-space significantly smaller, which reduces again the
sensibility to sound degradations using a series of discriminant analyses.
Finally, anchoring the analysis times on local maxima of a selected onset
function, an approximative hashing is done to provide a better tolerance to bit
corruptions, and in the same time to make easier the scaling of the method
Blind estimation of audio effects using an auto-encoder approach and differentiable signal processing
Blind Estimation of Audio Effects (BE-AFX) aims at estimating the Audio
Effects (AFXs) applied to an original, unprocessed audio sample solely based on
the processed audio sample. To train such a system traditional approaches
optimize a loss between ground truth and estimated AFX parameters. This
involves knowing the exact implementation of the AFXs used for the process. In
this work, we propose an alternative solution that eliminates the requirement
for knowing this implementation. Instead, we introduce an auto-encoder
approach, which optimizes an audio quality metric. We explore, suggest, and
compare various implementations of commonly used mastering AFXs, using
differential signal processing or neural approximations. Our findings
demonstrate that our auto-encoder approach yields superior estimates of the
audio quality produced by a chain of AFXs, compared to the traditional
parameter-based approach, even if the latter provides a more accurate parameter
estimation
- …