30,017 research outputs found
Learning content-based metrics for music similarity
In this abstract, we propose a method to learn application-specific content-based metrics for music similarity using unsupervised feature learning and neighborhood components analysis. Multiple-timescale features extracted from music audio are embedded into a Euclidean metric space, so that the distance between songs reflects their similarity. We evaluated the method on the GTZAN and Magnatagatune datasets
A Systematic Comparison of Music Similarity Adaptation Approaches
In order to support individual user perspectives and different retrieval tasks, music similarity can no longer be considered as a static element of Music Information Retrieval (MIR) systems. Various approaches have been proposed recently that allow dynamic adaptation of music similarity measures. This paper provides a systematic comparison of algorithms for metric learning and higher-level facet distance weighting on the MagnaTagATune dataset. A crossvalidation variant taking into account clip availability is presented. Applied on user generated similarity data, its effect on adaptation performance is analyzed. Special attention is paid to the amount of training data necessary for making similarity predictions on unknown data, the number of model parameters and the amount of information available about the music itself. 1
Recommended from our members
Learning music similarity from relative user ratings
Computational modelling of music similarity is an increasingly important part of personalisation and optimisation in music information retrieval and research in music perception and cognition. The use of relative similarity ratings is a new and promising approach to modelling similarity that avoids well known problems with absolute ratings. In this article, we use relative ratings from the MagnaTagATune dataset with new and existing variants of state-of-the-art algorithms and provide the first comprehensive and rigorous evaluation of this approach. We compare metric learning based on support vector machines (SVMs) and metric-learning-to-rank (MLR), including a diagonal and a novel weighted variant, and relative distance learning with neural networks (RDNN). We further evaluate the effectiveness of different high and low level audio features and genre data, as well as dimensionality reduction methods, weighting of similarity ratings, and different sampling methods. Our results show that music similarity measures learnt on relative ratings can be significantly better than a standard Euclidian metric, depending on the choice of learning algorithm, feature sets and application scenario. MLR and SVM outperform DMLR and RDNN, while MLR with weighted ratings leads to no further performance gain. Timbral and music-structural features are most effective, and all features jointly are significantly better than any other combination of feature sets. Sharing audio clips (but not the similarity ratings) between test and training sets improves performance, in particular for the SVM-based methods, which is useful for some applications scenarios. A testing framework has been implemented in Matlab and made publicly available http://mi.soi.city.ac.uk/datasets/ir2012framework so that these results are reproducible
Recommended from our members
Combining Sources of Description for Approximating Music Similarity Ratings
In this paper, we compare the effectiveness of basic acoustic features and genre annotations when adapting a music similarity model to user ratings. We use the Metric Learning to Rank algorithm to learn a Mahalanobis metric from comparative similarity ratings in in the MagnaTagATune database. Using common formats for feature data, our approach can easily be transferred to other existing databases. Our results show that genre data allow more effective learning of a metric than simple audio features, but a combination of both feature sets clearly outperforms either individual set
Multiscale approaches to music audio feature learning
Content-based music information retrieval tasks are typically solved with a two-stage approach: features are extracted from music audio signals, and are then used as input to a regressor or classifier. These features can be engineered or learned from data. Although the former approach was dominant in the past, feature learning has started to receive more attention from the MIR community in recent years. Recent results in feature learning indicate that simple algorithms such as K-means can be very effective, sometimes surpassing more complicated approaches based on restricted Boltzmann machines, autoencoders or sparse coding. Furthermore, there has been increased interest in multiscale representations of music audio recently. Such representations are more versatile because music audio exhibits structure on multiple timescales, which are relevant for different MIR tasks to varying degrees. We develop and compare three approaches to multiscale audio feature learning using the spherical K-means algorithm. We evaluate them in an automatic tagging task and a similarity metric learning task on the Magnatagatune dataset
Perceptual musical similarity metric learning with graph neural networks
Sound retrieval for assisted music composition depends on evaluating similarity between musical instrument sounds, which is partly influenced by playing techniques. Previous methods utilizing Euclidean nearest neighbours over acoustic features show some limitations in retrieving sounds sharing equivalent timbral properties, but potentially generated using a different instrument, playing technique, pitch or dynamic. In this paper, we present a metric learning system designed to approximate human similarity judgments between extended musical playing techniques using graph neural networks. Such structure is a natural candidate for solving similarity retrieval tasks, yet have seen little application in modelling perceptual music similarity. We optimize a Graph Convolutional Network (GCN) over acoustic features via a proxy metric learning loss to learn embeddings that reflect perceptual similarities. Specifically, we construct the graph's adjacency matrix from the acoustic data manifold with an example-wise adaptive k-nearest neighbourhood graph: Adaptive Neighbourhood Graph Neural Network (AN-GNN). Our approach achieves 96.4% retrieval accuracy compared to 38.5% with a Euclidean metric and 86.0% with a multilayer perceptron (MLP), while effectively considering retrievals from distinct playing techniques to the query example
Training of Tonal Similarity Ratings in Non-Musicians: A âRapid Learningâ Approach
Although cognitive music psychology has a long tradition of expertânovice comparisons, experimental training studies are rare. Studies on the learning progress of trained novices in hearing harmonic relationships are still largely lacking. This paper presents a simple training concept using the example of tone/triad similarity ratings, demonstrating the gradual progress of non-musicians compared to musical experts: In a feedback-based ârapid learningâ paradigm, participants had to decide for single tones and chords whether paired sounds matched each other well. Before and after the training sessions, they provided similarity judgments for a complete set of sound pairs. From these similarity matrices, individual relational sound maps, intended to display mental representations, were calculated by means of non-metric multidimensional scaling (NMDS), and were compared to an expert model through procrustean transformation. Approximately half of the novices showed substantial learning success, with some participants even reaching the level of professional musicians. Results speak for a fundamental ability to quickly train an understanding of harmony, show inter-individual differences in learning success, and demonstrate the suitability of the scaling method used for learning research in music and other domains. Results are discussed in the context of the âgiftednessâ debate
- âŠ