Search CORE

76 research outputs found

STRUCTURED SPARSITY FOR AUTOMATIC MUSIC TRANSCRIPTION

Author: IEEE
Nagano H
O'Hanlon K
Plumbley MD
Publication venue
Publication date: 01/01/2012
Field of study

© 2012 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

Queen Mary Research Online

Automatic music transcription: challenges and future directions

Author: Anssi Klapuri
Anssi Klapuri
Dimitrios Giannoulis
E. Benetos
Emmanouil Benetos
Emmanouil Benetos
Holger Kirchhoff
Holger Kirchhoff
See Profile
Simon Dixon
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

Automatic music transcription is considered by many to be a key enabling technology in music signal processing. However, the performance of transcription systems is still significantly below that of a human expert, and accuracies reported in recent years seem to have reached a limit, although the field is still very active. In this paper we analyse limitations of current methods and identify promising directions for future research. Current transcription methods use general purpose models which are unable to capture the rich diversity found in music signals. One way to overcome the limited performance of transcription systems is to tailor algorithms to specific use-cases. Semi-automatic approaches are another way of achieving a more reliable transcription. Also, the wealth of musical scores and corresponding audio data now available are a rich potential source of training data, via forced alignment of audio to scores, but large scale utilisation of such data has yet to be attempted. Other promising approaches include the integration of information from multiple algorithms and different musical aspects

CiteSeerX

City Research Online

Crossref

Queen Mary Research Online

Automatic transcription of polyphonic music exploiting temporal evolution

Author: Benetos E
Publication venue: 'Queen Mary University of London'
Publication date: 01/01/2012
Field of study

PhDAutomatic music transcription is the process of converting an audio recording into a symbolic representation using musical notation. It has numerous applications in music information retrieval, computational musicology, and the creation of interactive systems. Even for expert musicians, transcribing polyphonic pieces of music is not a trivial task, and while the problem of automatic pitch estimation for monophonic signals is considered to be solved, the creation of an automated system able to transcribe polyphonic music without setting restrictions on the degree of polyphony and the instrument type still remains open. In this thesis, research on automatic transcription is performed by explicitly incorporating information on the temporal evolution of sounds. First efforts address the problem by focusing on signal processing techniques and by proposing audio features utilising temporal characteristics. Techniques for note onset and offset detection are also utilised for improving transcription performance. Subsequent approaches propose transcription models based on shift-invariant probabilistic latent component analysis (SI-PLCA), modeling the temporal evolution of notes in a multiple-instrument case and supporting frequency modulations in produced notes. Datasets and annotations for transcription research have also been created during this work. Proposed systems have been privately as well as publicly evaluated within the Music Information Retrieval Evaluation eXchange (MIREX) framework. Proposed systems have been shown to outperform several state-of-the-art transcription approaches. Developed techniques have also been employed for other tasks related to music technology, such as for key modulation detection, temperament estimation, and automatic piano tutoring. Finally, proposed music transcription models have also been utilized in a wider context, namely for modeling acoustic scenes

CiteSeerX

Queen Mary Research Online

Performance Evaluation of Selected Cost Functions in Non Negative Matrix Factorization Based Decomposition of Acoustic Mixture

Author: Adewusi Adeoluwawale
Amusa Kamoli A
Are Aliu S
Publication venue: Faculty of Engineering, Federal University Oye-Ekiti
Publication date: 31/03/2019
Field of study

Interaction of acoustic signals when several audio sources are active simultaneously results in the disturbance of estimation of an individual source by co-occurring sounds. Data decomposition therefore constitutes one of the core tasks in monaural source separation. Particularly, in semi-supervised learning approach, viable means of achieving this is through the application of Non-negative Matrix Factorization (NMF). Owing to a paucity of information on the application of this method, especially in a speech system, evaluation of some cost functions in NMF-based monaural speech decomposition was investigated in this study. A generalized gradient descent algorithm is derived for the minimization while three cost functions: Euclidean Distance, Kullback-Leibler Divergence and Itakura-Saito divergences are applied to the derived separation NMF algorithm. These divergences are evaluated using experimental data while the performance of each of these is evaluated based on the cost values and convergence rate. Itakura-Saito divergence yields optimal performance over the other two divergences for given number of iterations and number of channels. Keywords— Cost functions, non-negative matrix factorization, speech separation, evaluatio

FUOYE Journal of Engineering and Technology (FUOYEJET - Federal University Oye-Ekiti)

Non-Negative Group Sparsity with Subspace Note Modelling for Polyphonic Transcription

Author: Keriven N
Nagano H
O'Hanlon K
Plumbley MD
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/03/2016
Field of study

This work was supported by EPSRC Platform Grant EPSRC EP/K009559/1, EPSRC Grant EP/L027119/1, and EPSRC Grant EP/J010375/1

Crossref

University of Surrey

Queen Mary Research Online

Surrey Research Insight

Hidden Markov models as priors for regularized nonnegative matrix factorization in single-channel source separation

Author: Erdogan Hakan
Erdoğan Hakan
Grais Emad Mounir
Publication venue: 'The International Fiscal Association of Korea'
Publication date: 09/09/2012
Field of study

We propose a new method to incorporate rich statistical priors, modeling temporal gain sequences in the solutions of nonnegative matrix factorization (NMF). The proposed method can be used for single-channel source separation (SCSS) applications. In NMF based SCSS, NMF is used to decompose the spectra of the observed mixed signal as a weighted linear combination of a set of trained basis vectors. In this work, the NMF decomposition weights are enforced to consider statistical and temporal prior information on the weight combination patterns that the trained basis vectors can jointly receive for each source in the observed mixed signal. The Hidden Markov Model (HMM) is used as a log-normalized gains (weights) prior model for the NMF solution. The normalization makes the prior models energy independent. HMM is used as a rich model that characterizes the statistics of sequential data. The NMF solutions for the weights are encouraged to increase the log-likelihood with the trained gain prior HMMs while reducing the NMF reconstruction error at the same time

Sabanci University Research Database

Recommended from our members

Bayesian methods in music modelling

Author: Peeling Paul
Publication venue: University of Cambridge
Publication date: 15/03/2011
Field of study

This thesis presents several hierarchical generative Bayesian models of musical signals designed to improve the accuracy of existing multiple pitch detection systems and other musical signal processing applications whilst remaining feasible for real-time computation. At the lowest level the signal is modelled as a set of overlapping sinusoidal basis functions. The parameters of these basis functions are built into a prior framework based on principles known from musical theory and the physics of musical instruments. The model of a musical note optionally includes phenomena such as frequency and amplitude modulations, damping, volume, timbre and inharmonicity. The occurrence of note onsets in a performance of a piece of music is controlled by an underlying tempo process and the alignment of the timings to the underlying score of the music. A variety of applications are presented for these models under differing inference constraints. Where full Bayesian inference is possible, reversible-jump Markov Chain Monte Carlo is employed to estimate the number of notes and partial frequency components in each frame of music. We also use approximate techniques such as model selection criteria and variational Bayes methods for inference in situations where computation time is limited or the amount of data to be processed is large. For the higher level score parameters, greedy search and conditional modes algorithms are found to be sufficiently accurate. We emphasize the links between the models and inference algorithms developed in this thesis with that in existing and parallel work, and demonstrate the effects of making modifications to these models both theoretically and by means of experimental results

Apollo (Cambridge)

The DESAM toolbox: spectral analysis of musical audio

Author: Badeau Roland
Bertin Nancy
Daudet Laurent
David Bertrand
Derrien Olivier
Echeveste Jose
Lagrange Mathieu
Marchand Sylvain
Publication venue: HAL CCSD
Publication date: 01/09/2010
Field of study

International audienceIn this paper is presented the DESAM Toolbox, a set of Matlab functions dedicated to the estimation of widely used spectral models for music signals. Although those models can be used in Music Information Retrieval (MIR) tasks, the core functions of the toolbox do not focus on any specific application. It is rather aimed at providing a range of state-of-the-art signal processing tools that decompose music files according to different signal models, giving rise to different ``mid-level'' representations. After motivating the need for such a toolbox, this paper offers an overview of the overall organization of the toolbox, and describes all available functionalities

HAL-CentraleSupelec

HAL AMU

INRIA a CCSD electronic archive server

Hal-Diderot

HAL-Rennes 1

Gaussian mixture gain priors for regularized nonnegative matrix factorization in single-channel source separation

Author: Erdogan Hakan
Erdoğan Hakan
Grais Emad Mounir
Publication venue: 'The International Fiscal Association of Korea'
Publication date: 01/01/2012
Field of study

We propose a new method to incorporate statistical priors on the solution of the nonnegative matrix factorization (NMF) for single-channel source separation (SCSS) applications. The Gaussian mixture model (GMM) is used as a log-normalized gain prior model for the NMF solution. The normalization makes the prior models energy independent. In NMF based SCSS, NMF is used to decompose the spectra of the observed mixed signal as a weighted linear combination of a set of trained basis vectors. In this work, the NMF decomposition weights are enforced to consider statistical prior information on the weight combination patterns that the trained basis vectors can jointly receive for each source in the observed mixed signal. The NMF solutions for the weights are encouraged to increase the loglikelihood with the trained gain prior GMMs while reducing the NMF reconstruction error at the same time

CiteSeerX

Sabanci University Research Database

Surrey Research Insight