67 research outputs found
Recommended from our members
Improving generalization for polyphonic piano transcription
In this paper, we present methods to improve the generalization capabilities of a classification-based approach to polyphonic piano transcription. Support vector machines trained on spectral features are used to classify frame-level note instances, and the independent classifications are temporally constrained via hidden Markov model post-processing. Semi-supervised learning and multiconditioning are investigated, and transcription results are reported for a compiled set of piano recordings. A reduction in the frame-level transcription error score of 10% was achieved by combining multiconditioning and semi-supervised classification
A Discriminative Model for Polyphonic Piano Transcription
We present a discriminative model for polyphonic piano transcription. Support vector machines trained on spectral features are used to classify frame-level note instances. The classifier outputs are temporally constrained via hidden Markov models, and the proposed system is used to transcribe both synthesized and real piano recordings. A frame-level transcription accuracy of 68% was achieved on a newly generated test set, and direct comparisons to previous approaches are provided
Identifying 'Cover Songs' with Chroma Features and Dynamic Programming Beat Tracking
Large music collections, ranging from thousands to millions of tracks, are unsuited to manual searching, motivating the development of automatic search methods. When different musicians perform the same underlying song or piece, these are known as 'cover' versions. We describe a system that attempts to identify such a relationship between music audio recordings. To overcome variability in tempo, we use beat tracking to describe each piece with one feature vector per beat. To deal with variation in instrumentation, we use 12-dimensional 'chroma' feature vectors that collect spectral energy supporting each semitone of the octave. To compare two recordings, we simply cross-correlate the entire beat-by-chroma representation for two tracks and look for sharp peaks indicating good local alignment between the pieces. Evaluation on several databases indicate good performance, including best performance on an independent international evaluation, where the system achieved a mean reciprocal ranking of 0.49 for true cover versions among top-10 returns
Recommended from our members
Identifying "Cover Songs" with Beat-Synchronous Chroma Features
Describes the problem of cover songs, how to calculate chroma features and track beats with dynamic programming, and how to match beat-chroma matrices
A Classification Approach to Melody Transcription
Melodies provide an important conceptual summarization of polyphonic audio. The extraction of melodic content has practical applications ranging from content-based audio retrieval to the analysis of musical structure. In contrast to previous transcription systems based on a model of the harmonic (or periodic) structure of musical pitches, we present a classification-based system for performing automatic melody transcription that makes no assumptions beyond what is learned from its training data. We evaluate the success of our algorithm by predicting the melody of the ISMIR 2004 Melody Competition evaluation set and on newly-generated test data. We show that a Support Vector Machine melodic classifier produces results comparable to state of the art model-based transcription systems
Recommended from our members
Melody Transcription From Music Audio: Approaches and Evaluation
Although the process of analyzing an audio recording of a music performance is complex and difficult even for a human listener, there are limited forms of information that may be tractably extracted and yet still enable interesting applications. We discuss melody--roughly, the part a listener might whistle or hum--as one such reduced descriptor of music audio, and consider how to define it, and what use it might be. We go on to describe the results of full-scale evaluations of melody transcription systems conducted in 2004 and 2005, including an overview of the systems submitted, details of how the evaluations were conducted, and a discussion of the results. For our definition of melody, current systems can achieve around 70% correct transcription at the frame level, including distinguishing between the presence or absence of the melody. Melodies transcribed at this level are readily recognizable, and show promise for practical applications
Recommended from our members
Support Vector Machine Active Learning for Music Retrieval
Searching and organizing growing digital music collections requires a computational model of music similarity. This paper describes a system for performing flexible music similarity queries using SVM active learning. We evaluated the success of our system by classifying 1210 pop songs according to mood and style (from an online music guide) and by the performing artist. In comparing a number of representations for songs, we found the statistics of mel-frequency cepstral coefficients to perform best in precision-at-20 comparisons. We also show that by choosing training examples intelligently, active learning requires half as many labeled examples to achieve the same accuracy as a standard scheme
Methodology issues concerning the accuracy of kinematic data collection and analysis using the ariel performance analysis system
Kinematics, the study of motion exclusive of the influences of mass and force, is one of the primary methods used for the analysis of human biomechanical systems as well as other types of mechanical systems. The Anthropometry and Biomechanics Laboratory (ABL) in the Crew Interface Analysis section of the Man-Systems Division performs both human body kinematics as well as mechanical system kinematics using the Ariel Performance Analysis System (APAS). The APAS supports both analysis of analog signals (e.g. force plate data collection) as well as digitization and analysis of video data. The current evaluations address several methodology issues concerning the accuracy of the kinematic data collection and analysis used in the ABL. This document describes a series of evaluations performed to gain quantitative data pertaining to position and constant angular velocity movements under several operating conditions. Two-dimensional as well as three-dimensional data collection and analyses were completed in a controlled laboratory environment using typical hardware setups. In addition, an evaluation was performed to evaluate the accuracy impact due to a single axis camera offset. Segment length and positional data exhibited errors within 3 percent when using three-dimensional analysis and yielded errors within 8 percent through two-dimensional analysis (Direct Linear Software). Peak angular velocities displayed errors within 6 percent through three-dimensional analyses and exhibited errors of 12 percent when using two-dimensional analysis (Direct Linear Software). The specific results from this series of evaluations and their impacts on the methodology issues of kinematic data collection and analyses are presented in detail. The accuracy levels observed in these evaluations are also presented
Recommended from our members
Multiple-instrument polyphonic music transcription using a temporally constrained shift-invariant model
A method for automatic transcription of polyphonic music is proposed in this work that models the temporal evolution of musical tones. The model extends the shift-invariant probabilistic latent component analysis method by supporting the use of spectral templates that correspond to sound states such as attack, sustain, and decay. The order of these templates is controlled using hidden Markov model-based temporal constraints. In addition, the model can exploit multiple templates per pitch and instrument source. The shift-invariant aspect of the model makes it suitable for music signals that exhibit frequency modulations or tuning changes. Pitch-wise hidden Markov models are also utilized in a postprocessing step for note tracking. For training, sound state templates were extracted for various orchestral instruments using isolated note samples. The proposed transcription system was tested on multiple-instrument recordings from various datasets. Experimental results show that the proposed model is superior to a non-temporally constrained model and also outperforms various state-of-the-art transcription systems for the same experiment
- …