342 research outputs found
Joint Multi-Pitch Detection Using Harmonic Envelope Estimation for Polyphonic Music Transcription
In this paper, a method for automatic transcription of music signals based on joint multiple-F0 estimation is proposed. As a time-frequency representation, the constant-Q resonator time-frequency image is employed, while a novel noise suppression technique based on pink noise assumption is applied in a preprocessing step. In the multiple-F0 estimation stage, the optimal tuning and inharmonicity parameters are computed and a salience function is proposed in order to select pitch candidates. For each pitch candidate combination, an overlapping partial treatment procedure is used, which is based on a novel spectral envelope estimation procedure for the log-frequency domain, in order to compute the harmonic envelope of candidate pitches. In order to select the optimal pitch combination for each time frame, a score function is proposed which combines spectral and temporal characteristics of the candidate pitches and also aims to suppress harmonic errors. For postprocessing, hidden Markov models (HMMs) and conditional random fields (CRFs) trained on MIDI data are employed, in order to boost transcription accuracy. The system was trained on isolated piano sounds from the MAPS database and was tested on classic and jazz recordings from the RWC database, as well as on recordings from a Disklavier piano. A comparison with several state-of-the-art systems is provided using a variety of error metrics, where encouraging results are indicated
Polyphonic music transcription using note onset and offset detection
In this paper, an approach for polyphonic music transcription based on joint multiple-F0 estimation and note onset/offset detection is proposed. For preprocessing, the resonator time-frequency image of the input music signal is extracted and noise suppression is performed. A pitch salience function is extracted for each frame along with tuning and inharmonicity parameters. For onset detection, late fusion is employed by combining a novel spectral flux-based feature which incorporates pitch tuning information and a novel salience function-based descriptor. For each segment defined by two onsets, an overlapping partial treatment procedure is used and a pitch set score function is proposed. A note offset detection procedure is also proposed using HMMs trained on MIDI data. The system was trained on piano chords and tested on classic and jazz recordings from the RWC database. Improved transcription results are reported compared to state-of-the-art approaches
Automatic transcription of polyphonic music exploiting temporal evolution
PhDAutomatic music transcription is the process of converting an audio recording
into a symbolic representation using musical notation. It has numerous applications
in music information retrieval, computational musicology, and the
creation of interactive systems. Even for expert musicians, transcribing polyphonic
pieces of music is not a trivial task, and while the problem of automatic
pitch estimation for monophonic signals is considered to be solved, the creation
of an automated system able to transcribe polyphonic music without setting
restrictions on the degree of polyphony and the instrument type still remains
open.
In this thesis, research on automatic transcription is performed by explicitly
incorporating information on the temporal evolution of sounds. First efforts address
the problem by focusing on signal processing techniques and by proposing
audio features utilising temporal characteristics. Techniques for note onset and
offset detection are also utilised for improving transcription performance. Subsequent
approaches propose transcription models based on shift-invariant probabilistic
latent component analysis (SI-PLCA), modeling the temporal evolution
of notes in a multiple-instrument case and supporting frequency modulations in
produced notes. Datasets and annotations for transcription research have also
been created during this work. Proposed systems have been privately as well as
publicly evaluated within the Music Information Retrieval Evaluation eXchange
(MIREX) framework. Proposed systems have been shown to outperform several
state-of-the-art transcription approaches.
Developed techniques have also been employed for other tasks related to music
technology, such as for key modulation detection, temperament estimation,
and automatic piano tutoring. Finally, proposed music transcription models
have also been utilized in a wider context, namely for modeling acoustic scenes
Automatic music transcription: challenges and future directions
Automatic music transcription is considered by many to be a key enabling technology in music signal processing. However, the performance of transcription systems is still significantly below that of a human expert, and accuracies reported in recent years seem to have reached a limit, although the field is still very active. In this paper we analyse limitations of current methods and identify promising directions for future research. Current transcription methods use general purpose models which are unable to capture the rich diversity found in music signals. One way to overcome the limited performance of transcription systems is to tailor algorithms to specific use-cases. Semi-automatic approaches are another way of achieving a more reliable transcription. Also, the wealth of musical scores and corresponding audio data now available are a rich potential source of training data, via forced alignment of audio to scores, but large scale utilisation of such data has yet to be attempted. Other promising approaches include the integration of information from multiple algorithms and different musical aspects
Acoustically Inspired Probabilistic Time-domain Music Transcription and Source Separation.
PhD ThesisAutomatic music transcription (AMT) and source separation are important
computational tasks, which can help to understand, analyse and process music
recordings. The main purpose of AMT is to estimate, from an observed
audio recording, a latent symbolic representation of a piece of music (piano-roll).
In this sense, in AMT the duration and location of every note played is
reconstructed from a mixture recording. The related task of source separation
aims to estimate the latent functions or source signals that were mixed
together in an audio recording. This task requires not only the duration and
location of every event present in the mixture, but also the reconstruction
of the waveform of all the individual sounds. Most methods for AMT and
source separation rely on the magnitude of time-frequency representations
of the analysed recording, i.e., spectrograms, and often arbitrarily discard
phase information. On one hand, this decreases the time resolution in AMT.
On the other hand, discarding phase information corrupts the reconstruction
in source separation, because the phase of each source-spectrogram must
be approximated. There is thus a need for models that circumvent phase
approximation, while operating at sample-rate resolution.
This thesis intends to solve AMT and source separation together from
an unified perspective. For this purpose, Bayesian non-parametric signal
processing, covariance kernels designed for audio, and scalable variational
inference are integrated to form efficient and acoustically-inspired probabilistic
models. To circumvent phase approximation while keeping sample-rate
resolution, AMT and source separation are addressed from a Bayesian time-domain
viewpoint. That is, the posterior distribution over the waveform of
each sound event in the mixture is computed directly from the observed data.
For this purpose, Gaussian processes (GPs) are used to define priors over the
sources/pitches. GPs are probability distributions over functions, and its
kernel or covariance determines the properties of the functions sampled from
a GP. Finally, the GP priors and the available data (mixture recording) are
combined using Bayes' theorem in order to compute the posterior distributions
over the sources/pitches.
Although the proposed paradigm is elegant, it introduces two main challenges.
First, as mentioned before, the kernel of the GP priors determines the
properties of each source/pitch function, that is, its smoothness, stationariness,
and more importantly its spectrum. Consequently, the proposed model
requires the design of flexible kernels, able to learn the rich frequency content
and intricate properties of audio sources. To this end, spectral mixture
(SM) kernels are studied, and the Mat ern spectral mixture (MSM) kernel
is introduced, i.e. a modified version of the SM covariance function. The
MSM kernel introduces less strong smoothness, thus it is more suitable for
modelling physical processes. Second, the computational complexity of GP
inference scales cubically with the number of audio samples. Therefore, the
application of GP models to large audio signals becomes intractable. To
overcome this limitation, variational inference is used to make the proposed
model scalable and suitable for signals in the order of hundreds of thousands
of data points.
The integration of GP priors, kernels intended for audio, and variational
inference could enable AMT and source separation time-domain methods to
reconstruct sources and transcribe music in an efficient and informed manner.
In addition, AMT and source separation are current challenges, because
the spectra of the sources/pitches overlap with each other in intricate
ways. Thus, the development of probabilistic models capable of differentiating
sources/pitches in the time domain, despite the high similarity between
their spectra, opens the possibility to take a step towards solving source separation
and automatic music transcription. We demonstrate the utility of our
methods using real and synthesized music audio datasets for various types of
musical instruments
Automatic transcription of music using deep learning techniques
Music transcription is the problem of detecting notes that are being played in a musical piece. This is a difficult task that only trained people are capable of doing. Due to its difficulty, there have been a high interest in automate it. However, automatic music transcription encompasses several fields of research such as, digital signal processing, machine learning, music theory and cognition, pitch perception and psychoacoustics. All of this, makes automatic music transcription an hard problem to solve.
In this work we present a novel approach of automatically transcribing piano musical pieces using deep learning techniques. We take advantage of deep learning techniques to build several classifiers, each one responsible for detecting only one musical note. In theory, this division of work would enhance the ability of each classifier to transcribe. Apart from that, we also apply two additional stages, pre-processing and post-processing, to improve the efficiency of our system. The pre-processing stage aims at improving the quality of the input data before the classification/transcription stage, while the post-processing aims at fixing errors originated during the classification stage.
In the initial steps, preliminary experiments have been performed to fine tune our model, in both three stages: pre-processing, classification and post-processing. The experimental setup, using those optimized techniques and parameters, is shown and a comparison is given with other two state-of-the-art works that apply the same dataset as well as the same deep learning technique but using a different approach. By different approach we mean that a single neural network is used to detect all the musical notes rather than one neural network per each note. Our approach was able to surpass in frame-based metrics these works, while reaching close results in onset-based metrics, demonstrating the feasability of our approach
Contributions to automatic multiple F0 detection in polyphonic music signals
Multiple fundamental frequency estimation, or multi-pitch estimation (MPE), is a key problem in automatic music transcription (AMT) and many other related audio processing tasks. Applications of AMT are numerous, ranging from musical genre classification to automatic piano tutoring, and these form a significant part of musical information retrieval tasks. Current AMT systems still perform considerably below human experts, and there is a consensus that the development of an automated system for full transcription of polyphonic music regardless of its complexity is still an open problem. The goal of this work is to propose contributions for the automatic detection of multiple fundamental frequencies in polyphonic music signals. A reference MPE method is chosen to be studied and implemented, and a modification is proposed to improve the performance of the system. Lastly, three refinement strategies are proposed to be incorporated into the modified method, in order to increase the quality of the results. Experimental tests reveal that such refinements improve the overall performance of the system, even if each one performs differently according to signal characteristics.Estimação de múltiplas frequências fundamentais (MPE, do inglês multipitch estimation) é um problema importante na área de transcrição musical automática (TMA) e em muitas outras tarefas relacionadas a processamento de áudio. Aplicações de TMA são diversas, desde classificação de gêneros musicais ao aprendizado automático de piano, as quais consistem em uma parcela significativa de tarefas de extração de informação musical. Métodos atuais de TMA ainda possuem um desempenho consideravelmente ruim quando comparados aos de profissionais da área, e há um consenso que o desenvolvimento de um sistema automatizado para a transcrição completa de música polifônica independentemente de sua complexidade ainda é um problema em aberto. O objetivo deste trabalho é propor contribuições para a detecção automática de múltiplas frequências fundamentais em sinais de música polifônica. Um método de referência para MPEé primeiramente escolhido para ser estudado e implementado, e uma modificação é proposta para melhorar o desempenho do sistema. Por fim, três estratégias de refinamento são propostas para serem incorporadas ao método modificado, com o objetivo de aumentar a qualidade dos resultados. Testes experimentais mostram que tais refinamentos melhoram em média o desempenho do sistema, embora cada um atue de uma maneira diferente de acordo com a natureza dos sinais
- …