Search CORE

394 research outputs found

High-resolution sinusoidal analysis for resolving harmonic collisions in music audio signal processing

Author: Ehmann Andreas
Publication venue
Publication date: 01/12/2011
Field of study

Many music signals can largely be considered an additive combination of multiple sources, such as musical instruments or voice. If the musical sources are pitched instruments, the spectra they produce are predominantly harmonic, and are thus well suited to an additive sinusoidal model. However, due to resolution limits inherent in time-frequency analyses, when the harmonics of multiple sources occupy equivalent time-frequency regions, their individual properties are additively combined in the time-frequency representation of the mixed signal. Any such time-frequency point in a mixture where multiple harmonics overlap produces a single observation from which the contributions owed to each of the individual harmonics cannot be trivially deduced. These overlaps are referred to as overlapping partials or harmonic collisions. If one wishes to infer some information about individual sources in music mixtures, the information carried in regions where collided harmonics exist becomes unreliable due to interference from other sources. This interference has ramifications in a variety of music signal processing applications such as multiple fundamental frequency estimation, source separation, and instrumentation identification. This thesis addresses harmonic collisions in music signal processing applications. As a solution to the harmonic collision problem, a class of signal subspace-based high-resolution sinusoidal parameter estimators is explored. Specifically, the direct matrix pencil method, or equivalently, the Estimation of Signal Parameters via Rotational Invariance Techniques (ESPRIT) method, is used with the goal of producing estimates of the salient parameters of individual harmonics that occupy equivalent time-frequency regions. This estimation method is adapted here to be applicable to time-varying signals such as musical audio. While high-resolution methods have been previously explored in the context of music signal processing, previous work has not addressed whether or not such methods truly produce high-resolution sinusoidal parameter estimates in real-world music audio signals. Therefore, this thesis answers the question of whether high-resolution sinusoidal parameter estimators are really high-resolution for real music signals. This work directly explores the capabilities of this form of sinusoidal parameter estimation to resolve collided harmonics. The capabilities of this analysis method are also explored in the context of music signal processing applications. Potential benefits of high-resolution sinusoidal analysis are examined in experiments involving multiple fundamental frequency estimation and audio source separation. This work shows that there are indeed benefits to high-resolution sinusoidal analysis in music signal processing applications, especially when compared to methods that produce sinusoidal parameter estimates based on more traditional time-frequency representations. The benefits of this form of sinusoidal analysis are made most evident in multiple fundamental frequency estimation applications, where substantial performance gains are seen. High-resolution analysis in the context of computational auditory scene analysis-based source separation shows similar performance to existing comparable methods

Illinois Digital Environment for Access to Learning and Scholarship Repository

Reliability-Informed Beat Tracking of Musical Signals

Author: Antonio Pena
Enrique Argones Rúa
Mark D. Plumbley
Matthew E. P. Davies
Norberto Degara
Soledad Torres-guijarro
Student Member
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2012
Field of study

Abstract—A new probabilistic framework for beat tracking of musical audio is presented. The method estimates the time between consecutive beat events and exploits both beat and non-beat information by explicitly modeling non-beat states. In addition to the beat times, a measure of the expected accuracy of the estimated beats is provided. The quality of the observations used for beat tracking is measured and the reliability of the beats is automatically calculated. A k-nearest neighbor regression algorithm is proposed to predict the accuracy of the beat estimates. The performance of the beat tracking system is statistically evaluated using a database of 222 musical signals of various genres. We show that modeling non-beat states leads to a significant increase in performance. In addition, a large experiment where the parameters of the model are automatically learned has been completed. Results show that simple approximations for the parameters of the model can be used. Furthermore, the performance of the system is compared with existing algorithms. Finally, a new perspective for beat tracking evaluation is presented. We show how reliability information can be successfully used to increase the mean performance of the proposed algorithm and discuss how far automatic beat tracking is from human tapping. Index Terms—Beat-tracking, beat quality, beat-tracking reliability, k-nearest neighbor (k-NN) regression, music signal processing. I

CiteSeerX

Crossref

Queen Mary Research Online

Surrey Research Insight

High precision frequency estimation for harpsichord tuning classification

Author: Dixon S.
Mauch M.
Tidhar D.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2010
Field of study

We present a novel music signal processing task of classifying the tuning of a harpsichord from audio recordings of standard musical works. We report the results of a classification experiment involving six different temperaments, using real harpsichord recordings as well as synthesised audio data. We introduce the concept of conservative transcription, and show that existing high-precision pitch estimation techniques are sufficient for our task if combined with conservative transcription. In particular, using the CQIFFT algorithm with conservative transcription and removal of short duration notes, we are able to distinguish between 6 different temperaments of harpsichord recordings with 96% accuracy (100% for synthetic data)

CiteSeerX

City Research Online

Crossref

Recommended from our members

Automatic transcription of pitched and unpitched sounds from polyphonic music

Author: Benetos E
Ewert S
IEEE
Weyde T
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2014
Field of study

Automatic transcription of polyphonic music has been an active research field for several years and is considered by many to be a key enabling technology in music signal processing. However, current transcription approaches either focus on detecting pitched sounds (from pitched musical instruments) or on detecting unpitched sounds (from drum kits). In this paper, we propose a method that jointly transcribes pitched and unpitched sounds from polyphonic music recordings. The proposed model extends the probabilistic latent component analysis algorithm and supports the detection of pitched sounds from multiple instruments as well as the detection of unpitched sounds from drum kit components, including bass drums, snare drums, cymbals, hi-hats, and toms. Our experiments based on polyphonic Western music containing both pitched and unpitched instruments led to very encouraging results in multi-pitch detection and drum transcription tasks

City Research Online

Crossref

Queen Mary Research Online

Noise Robust Pitch Tracking by Subband Autocorrelation Classification

Author: Ellis Daniel P. W.
Lee Byunk Suk
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2012
Field of study

Pitch tracking algorithms have a long history in various applications such as speech coding and extracting information, as well as other domains such as bioacoustics and music signal processing. While autocorrelation is a useful technique for detecting periodicity, autocorrelation peaks suffer ambiguity, leading to the classic “octave error” in pitch tracking. Moreover, additive noise can affect autocorrelation in ways that are difficult to model. Instead of explicitly using the most obvious features of autocorrelation, we present a trained classifier-based approach which we call Subband Autocorrelation Classification (SAcC). A multi-layer perceptron classifier is trained on the principal components of the autocorrelations of subbands from an auditory filterbank. Training on bandlimited and noisy speech (processed to simulate a low-quality radio channel) leads to a great increase in performance over state-of-the-art algorithms, according to both the traditional GPE measure, and a proposed novel Pitch Tracking Error which more fully reflects the accuracy of both pitch extraction and voicing detection in a single measure

CiteSeerX

Columbia University Academic Commons

Automatic music transcription: challenges and future directions

Author: Anssi Klapuri
Anssi Klapuri
Dimitrios Giannoulis
E. Benetos
Emmanouil Benetos
Emmanouil Benetos
Holger Kirchhoff
Holger Kirchhoff
See Profile
Simon Dixon
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

Automatic music transcription is considered by many to be a key enabling technology in music signal processing. However, the performance of transcription systems is still significantly below that of a human expert, and accuracies reported in recent years seem to have reached a limit, although the field is still very active. In this paper we analyse limitations of current methods and identify promising directions for future research. Current transcription methods use general purpose models which are unable to capture the rich diversity found in music signals. One way to overcome the limited performance of transcription systems is to tailor algorithms to specific use-cases. Semi-automatic approaches are another way of achieving a more reliable transcription. Also, the wealth of musical scores and corresponding audio data now available are a rich potential source of training data, via forced alignment of audio to scores, but large scale utilisation of such data has yet to be attempted. Other promising approaches include the integration of information from multiple algorithms and different musical aspects

CiteSeerX

City Research Online

Crossref

Queen Mary Research Online

Onset Event Decoding Exploiting the Rhythmic Structure of Polyphonic Music

Author: Antonio Pena
Mark D. Plumbley
Matthew E. P. Davies
Norberto Degara
Student Member
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/10/2011
Field of study

(c)2011 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works. Published version: IEEE Journal of Selected Topics in Signal Processing 5(6): 1228-1239, Oct 2011. DOI:10.1109/JSTSP.2011.214622

CiteSeerX

Crossref

Queen Mary Research Online

Surrey Research Insight