48 research outputs found
Recommended from our members
Score-informed transcription for automatic piano tutoring
In this paper, a score-informed transcription method for automatic piano tutoring is proposed. The method takes as input a recording made by a student which may contain mistakes, along with a reference score. The recording and the aligned synthesized score are automatically transcribed using the non-negative matrix factorization algorithm for multi-pitch estimation and hidden Markov models for note tracking. By comparing the two transcribed recordings, common errors occurring in transcription algorithms such as extra octave notes can be suppressed. The result is a piano-roll description which shows the mistakes made by the student along with the correctly played notes. Evaluation was performed on six pieces recorded using a Disklavier piano, using both manually-aligned and automatically-aligned scores as an input. Results comparing the system output with ground-truth annotation of the original recording reach a weighted F-measure of 93%, indicating that the proposed method can successfully analyze the student's performance
Identifying Cover Songs Using Information-Theoretic Measures of Similarity
This work is licensed under a Creative Commons Attribution 3.0 License. For more information, see http://creativecommons.org/licenses/by/3.0/This paper investigates methods for quantifying similarity between audio signals, specifically for the task of cover song detection. We consider an information-theoretic approach, where we compute pairwise measures of predictability between time series. We compare discrete-valued approaches operating on quantized audio features, to continuous-valued approaches. In the discrete case, we propose a method for computing the normalized compression distance, where we account for correlation between time series. In the continuous case, we propose to compute information-based measures of similarity as statistics of the prediction error between time series. We evaluate our methods on two cover song identification tasks using a data set comprised of 300 Jazz standards and using the Million Song Dataset. For both datasets, we observe that continuous-valued approaches outperform discrete-valued approaches. We consider approaches to estimating the normalized compression distance (NCD) based on string compression and prediction, where we observe that our proposed normalized compression distance with alignment (NCDA) improves average performance over NCD, for sequential compression algorithms. Finally, we demonstrate that continuous-valued distances may be combined to improve performance with respect to baseline approaches. Using a large-scale filter-and-refine approach, we demonstrate state-of-the-art performance for cover song identification using the Million Song Dataset.The work of P. Foster was supported by an Engineering and Physical Sciences Research Council Doctoral Training Account studentship
IDENTIFICATION OF COVER SONGS USING INFORMATION THEORETIC MEASURES OF SIMILARITY
13 pages, 5 figures, 4 tables. v3: Accepted version13 pages, 5 figures, 4 tables. v3: Accepted version13 pages, 5 figures, 4 tables. v3: Accepted versio
INSTRUMENTATION-BASED MUSIC SIMILARITY USING SPARSE REPRESENTATIONS
© 2012 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works
Recommended from our members
Improving instrument recognition in polyphonic music through system integration
A method is proposed for instrument recognition in polyphonic music which combines two independent detector systems. A polyphonic musical instrument recognition system using a missing feature approach and an automatic music transcription system based on shift invariant probabilistic latent component analysis that includes instrument assignment. We propose a method to integrate the two systems by fusing the instrument contributions estimated by the first system onto the transcription system in the form of Dirichlet priors. Both systems, as well as the integrated system are evaluated using a dataset of continuous polyphonic music recordings. Detailed results that highlight a clear improvement in the performance of the integrated system are reported for different training conditions
Instrumentation-based music similarity using sparse representations
International audienc
On the disjointess of sources in music using different time-frequency representations
This paper studies the disjointness of the time-frequency representations of simultaneously playing musical instruments. As a measure of disjointness, we use the approximate W-disjoint orthogonality as proposed by Yilmaz and Rickard [1], which (loosely speaking) measures the degree of overlap of different sources in the time-frequency domain. The motivation for this study is to find a maximally disjoint representation in order to facilitate the separation and recognition of musical instruments in mixture signals. The transforms investigated in this paper include the short-time Fourier transform (STFT), constant-Q transform, modified discrete cosine transform (MDCT), and pitch-synchronous lapped orthogonal transforms. Simulation results are reported for a database of polyphonic music where the multitrack data (instrument signals before mixing) were available. Absolute performance varies depending on the instrument source in question, but on the average MDCT with 93 ms frame size performed best. © 2011 IEEE
Automatic Music Transcription: Breaking the Glass Ceiling
Automatic music transcription is considered by many to be the Holy Grail in the field of music signal analysis. However, the performance of transcription systems is still significantly below that of a human expert, and accuracies reported in recent years seem to have reached a limit, although the field is still very active. In this paper we analyse limitations of current methods and identify promising directions for future research. Current transcription methods use general purpose models which are unable to capture the rich diversity found in music signals. In order to overcome the limited performance of transcription systems, algorithms have to be tailored to specific use-cases. Semi-automatic approaches are another way of achieving a more reliable transcription. Also, the wealth of musical scores and corresponding audio data now available are a rich potential source of training data, via forced alignment of audio to scores, but large scale utilisation of such data has yet to be attempted. Other promising approaches include the integration of information across different methods and musical aspects