Search CORE

57 research outputs found

A temporally-constrained convolutive probabilistic model for pitch detection

Author: Benetos E.
Dixon S.
Publication venue
Publication date: 01/01/2011
Field of study

A method for pitch detection which models the temporal evolution of musical sounds is presented in this paper. The proposed model is based on shift-invariant probabilistic latent component analysis, constrained by a hidden Markov model. The time-frequency representation of a produced musical note can be expressed by the model as a temporal sequence of spectral templates which can also be shifted over log-frequency. Thus, this approach can be effectively used for pitch detection in music signals that contain amplitude and frequency modulations. Experiments were performed using extracted sequences of spectral templates on monophonic music excerpts, where the proposed model outperforms a non-temporally constrained convolutive model for pitch detection. Finally, future directions are given for multipitch extensions of the proposed model

CiteSeerX

City Research Online

Crossref

Audio source separation for music in low-latency and high-latency scenarios

Author: Marxer Piñón Ricard
Publication venue: 'Universitat Pompeu Fabra'
Publication date: 01/01/2013
Field of study

Aquesta tesi proposa mètodes per tractar les limitacions de les tècniques existents de separació de fonts musicals en condicions de baixa i alta latència. En primer lloc, ens centrem en els mètodes amb un baix cost computacional i baixa latència. Proposem l'ús de la regularització de Tikhonov com a mètode de descomposició de l'espectre en el context de baixa latència. El comparem amb les tècniques existents en tasques d'estimació i seguiment dels tons, que són passos crucials en molts mètodes de separació. A continuació utilitzem i avaluem el mètode de descomposició de l'espectre en tasques de separació de veu cantada, baix i percussió. En segon lloc, proposem diversos mètodes d'alta latència que milloren la separació de la veu cantada, gràcies al modelatge de components específics, com la respiració i les consonants. Finalment, explorem l'ús de correlacions temporals i anotacions manuals per millorar la separació dels instruments de percussió i dels senyals musicals polifònics complexes.Esta tesis propone métodos para tratar las limitaciones de las técnicas existentes de separación de fuentes musicales en condiciones de baja y alta latencia. En primer lugar, nos centramos en los métodos con un bajo coste computacional y baja latencia. Proponemos el uso de la regularización de Tikhonov como método de descomposición del espectro en el contexto de baja latencia. Lo comparamos con las técnicas existentes en tareas de estimación y seguimiento de los tonos, que son pasos cruciales en muchos métodos de separación. A continuación utilizamos y evaluamos el método de descomposición del espectro en tareas de separación de voz cantada, bajo y percusión. En segundo lugar, proponemos varios métodos de alta latencia que mejoran la separación de la voz cantada, gracias al modelado de componentes que a menudo no se toman en cuenta, como la respiración y las consonantes. Finalmente, exploramos el uso de correlaciones temporales y anotaciones manuales para mejorar la separación de los instrumentos de percusión y señales musicales polifónicas complejas.This thesis proposes specific methods to address the limitations of current music source separation methods in low-latency and high-latency scenarios. First, we focus on methods with low computational cost and low latency. We propose the use of Tikhonov regularization as a method for spectrum decomposition in the low-latency context. We compare it to existing techniques in pitch estimation and tracking tasks, crucial steps in many separation methods. We then use the proposed spectrum decomposition method in low-latency separation tasks targeting singing voice, bass and drums. Second, we propose several high-latency methods that improve the separation of singing voice by modeling components that are often not accounted for, such as breathiness and consonants. Finally, we explore using temporal correlations and human annotations to enhance the separation of drums and complex polyphonic music signals

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Tesis Doctorals en Xarxa

Automatic music transcription: challenges and future directions

Author: Anssi Klapuri
Anssi Klapuri
Dimitrios Giannoulis
E. Benetos
Emmanouil Benetos
Emmanouil Benetos
Holger Kirchhoff
Holger Kirchhoff
See Profile
Simon Dixon
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

Automatic music transcription is considered by many to be a key enabling technology in music signal processing. However, the performance of transcription systems is still significantly below that of a human expert, and accuracies reported in recent years seem to have reached a limit, although the field is still very active. In this paper we analyse limitations of current methods and identify promising directions for future research. Current transcription methods use general purpose models which are unable to capture the rich diversity found in music signals. One way to overcome the limited performance of transcription systems is to tailor algorithms to specific use-cases. Semi-automatic approaches are another way of achieving a more reliable transcription. Also, the wealth of musical scores and corresponding audio data now available are a rich potential source of training data, via forced alignment of audio to scores, but large scale utilisation of such data has yet to be attempted. Other promising approaches include the integration of information from multiple algorithms and different musical aspects

CiteSeerX

City Research Online

Crossref

Queen Mary Research Online

From heuristics-based to data-driven audio melody extraction

Author: Bosch Juan J.
Publication venue
Publication date: 01/01/2017
Field of study

The identification of the melody from a music recording is a relatively easy task for humans, but very challenging for computational systems. This task is known as "audio melody extraction", more formally defined as the automatic estimation of the pitch sequence of the melody directly from the audio signal of a polyphonic music recording. This thesis investigates the benefits of exploiting knowledge automatically derived from data for audio melody extraction, by combining digital signal processing and machine learning methods. We extend the scope of melody extraction research by working with a varied dataset and multiple definitions of melody. We first present an overview of the state of the art, and perform an evaluation focused on a novel symphonic music dataset. We then propose melody extraction methods based on a source-filter model and pitch contour characterisation and evaluate them on a wide range of music genres. Finally, we explore novel timbre, tonal and spatial features for contour characterisation, and propose a method for estimating multiple melodic lines. The combination of supervised and unsupervised approaches leads to advancements on melody extraction and shows a promising path for future research and applications

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

ZENODO

Tesis Doctorals en Xarxa

Recommended from our members

A Shift-Invariant Latent Variable Model for Automatic Music Transcription

Author: Bay M.
Benetos E.
Dessein A.
Emmanouil Benetos
Fuentes B.
Goto M.
Grindlay G.
Mysore G.
Poliner G.
Schörkhuber C.
Simon Dixon
Smaragdis P.
Publication venue: 'MIT Press - Journals'
Publication date: 01/12/2012
Field of study

In this work, a probabilistic model for multiple-instrument automatic music transcription is proposed. The model extends the shift-invariant probabilistic latent component analysis method, which is used for spectrogram factorization. Proposed extensions support the use of multiple spectral templates per pitch and per instrument source, as well as a time-varying pitch contribution for each source. Thus, this method can effectively be used for multiple-instrument automatic transcription. In addition, the shift-invariant aspect of the method can be exploited for detecting tuning changes and frequency modulations, as well as for visualizing pitch content. For note tracking and smoothing, pitch-wise hidden Markov models are used. For training, pitch templates from eight orchestral instruments were extracted, covering their complete note range. The transcription system was tested on multiple-instrument polyphonic recordings from the RWC database, a Disklavier data set, and the MIREX 2007 multi-F0 data set. Results demonstrate that the proposed method outperforms leading approaches from the transcription literature, using several error metrics

City Research Online

Crossref

Automatic transcription of polyphonic music exploiting temporal evolution

Author: Benetos E
Publication venue: 'Queen Mary University of London'
Publication date: 01/01/2012
Field of study

PhDAutomatic music transcription is the process of converting an audio recording into a symbolic representation using musical notation. It has numerous applications in music information retrieval, computational musicology, and the creation of interactive systems. Even for expert musicians, transcribing polyphonic pieces of music is not a trivial task, and while the problem of automatic pitch estimation for monophonic signals is considered to be solved, the creation of an automated system able to transcribe polyphonic music without setting restrictions on the degree of polyphony and the instrument type still remains open. In this thesis, research on automatic transcription is performed by explicitly incorporating information on the temporal evolution of sounds. First efforts address the problem by focusing on signal processing techniques and by proposing audio features utilising temporal characteristics. Techniques for note onset and offset detection are also utilised for improving transcription performance. Subsequent approaches propose transcription models based on shift-invariant probabilistic latent component analysis (SI-PLCA), modeling the temporal evolution of notes in a multiple-instrument case and supporting frequency modulations in produced notes. Datasets and annotations for transcription research have also been created during this work. Proposed systems have been privately as well as publicly evaluated within the Music Information Retrieval Evaluation eXchange (MIREX) framework. Proposed systems have been shown to outperform several state-of-the-art transcription approaches. Developed techniques have also been employed for other tasks related to music technology, such as for key modulation detection, temperament estimation, and automatic piano tutoring. Finally, proposed music transcription models have also been utilized in a wider context, namely for modeling acoustic scenes

CiteSeerX

Queen Mary Research Online

USING SCORE-INFORMED CONSTRAINTS FOR NMF-BASED SOURCE SEPARATION

Author: Ewert S
IEEE
Mueller M
Publication venue
Publication date: 28/04/2016
Field of study

Queen Mary Research Online

Recommended from our members

Multiple-instrument polyphonic music transcription using a convolutive probabilistic model

Author: Benetos E.
Dixon S.
Publication venue
Publication date: 01/01/2011
Field of study

(Abstract to follow

City Research Online

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Score-Informed Source Separation for Music Signals

Author: Ewert Sebastian
Publication venue: Dagstuhl Follow-Ups. Multimodal Music Processing
Publication date: 01/01/2012
Field of study

In recent years, the processing of audio recordings by exploiting additional musical knowledge has turned out to be a promising research direction. In particular, additional note information as specified by a musical score or a MIDI file has been employed to support various audio processing tasks such as source separation, audio parameterization, performance analysis, or instrument equalization. In this contribution, we provide an overview of approaches for score-informed source separation and illustrate their potential by discussing innovative applications and interfaces. Additionally, to illustrate some basic principles behind these approaches, we demonstrate how score information can be integrated into the well-known non-negative matrix factorization (NMF) framework. Finally, we compare this approach to advanced methods based on parametric models

Dagstuhl Research Online Publication Server

Notentext-Informierte Quellentrennung für Musiksignale

Author: Driedger J
Ewert S
Müller M
Publication venue
Publication date: 28/04/2016
Field of study

codedemo: http://www.audiolabs-erlangen.de/resources/2013-ACMMM-AudioDecomp/codedemo: http://www.audiolabs-erlangen.de/resources/2013-ACMMM-AudioDecomp/codedemo: http://www.audiolabs-erlangen.de/resources/2013-ACMMM-AudioDecomp/codedemo: http://www.audiolabs-erlangen.de/resources/2013-ACMMM-AudioDecomp

Queen Mary Research Online