2,390 research outputs found
Shift-Invariant Kernel Additive Modelling for Audio Source Separation
A major goal in blind source separation to identify and separate sources is
to model their inherent characteristics. While most state-of-the-art approaches
are supervised methods trained on large datasets, interest in non-data-driven
approaches such as Kernel Additive Modelling (KAM) remains high due to their
interpretability and adaptability. KAM performs the separation of a given
source applying robust statistics on the time-frequency bins selected by a
source-specific kernel function, commonly the K-NN function. This choice
assumes that the source of interest repeats in both time and frequency. In
practice, this assumption does not always hold. Therefore, we introduce a
shift-invariant kernel function capable of identifying similar spectral content
even under frequency shifts. This way, we can considerably increase the amount
of suitable sound material available to the robust statistics. While this leads
to an increase in separation performance, a basic formulation, however, is
computationally expensive. Therefore, we additionally present acceleration
techniques that lower the overall computational complexity.Comment: Feedback is welcom
From heuristics-based to data-driven audio melody extraction
The identification of the melody from a music recording is a relatively easy task for humans, but very challenging for computational systems. This task is known as "audio melody extraction", more formally defined as the automatic estimation of the pitch sequence of the melody directly from the audio signal of a polyphonic music recording. This thesis investigates the benefits of exploiting knowledge automatically derived from data for audio melody extraction, by combining digital signal processing and machine learning methods. We extend the scope of melody extraction research by working with a varied dataset and multiple definitions of melody. We first present an overview of the state of the art, and perform an evaluation focused on a novel symphonic music dataset. We then propose melody extraction methods based on a source-filter model and pitch contour characterisation and evaluate them on a wide range of music genres. Finally, we explore novel timbre, tonal and spatial features for contour characterisation, and propose a method for estimating multiple melodic lines. The combination of supervised and unsupervised approaches leads to advancements on melody extraction and shows a promising path for future research and applications
A COMPARISON OF EXTENDED SOURCE-FILTER MODELS FOR MUSICAL SIGNAL RECONSTRUCTION
China Scholarship Council (CSC)/
Queen Mary Joint PhD scholarship;
Royal Academy of Engineering Research Fellowshi
Evaluation and combination of pitch estimation methods for melody extraction in symphonic classical music
The extraction of pitch information is arguably one of the most important
tasks in automatic music description systems. However, previous
research and evaluation datasets dealing with pitch estimation focused
on relatively limited kinds of musical data. This work aims to broaden
this scope by addressing symphonic western classical music recordings,
focusing on pitch estimation for melody extraction. This material is characterised
by a high number of overlapping sources, and by the fact that the
melody may be played by different instrumental sections, often alternating
within an excerpt. We evaluate the performance of eleven state-of-the-art
pitch salience functions, multipitch estimation and melody extraction algorithms
when determining the sequence of pitches corresponding to the
main melody in a varied set of pieces. An important contribution of the
present study is the proposed evaluation framework, including the annotation
methodology, generated dataset and evaluation metrics. The results
show that the assumptions made by certain methods hold better than
others when dealing with this type of music signals, leading to a better
performance. Additionally, we propose a simple method for combining
the output of several algorithms, with promising results
Comparison for Improvements of Singing Voice Detection System Based on Vocal Separation
Singing voice detection is the task to identify the frames which contain the
singer vocal or not. It has been one of the main components in music
information retrieval (MIR), which can be applicable to melody extraction,
artist recognition, and music discovery in popular music. Although there are
several methods which have been proposed, a more robust and more complete
system is desired to improve the detection performance. In this paper, our
motivation is to provide an extensive comparison in different stages of singing
voice detection. Based on the analysis a novel method was proposed to build a
more efficiently singing voice detection system. In the proposed system, there
are main three parts. The first is a pre-process of singing voice separation to
extract the vocal without the music. The improvements of several singing voice
separation methods were compared to decide the best one which is integrated to
singing voice detection system. And the second is a deep neural network based
classifier to identify the given frames. Different deep models for
classification were also compared. The last one is a post-process to filter out
the anomaly frame on the prediction result of the classifier. The median filter
and Hidden Markov Model (HMM) based filter as the post process were compared.
Through the step by step module extension, the different methods were compared
and analyzed. Finally, classification performance on two public datasets
indicates that the proposed approach which based on the Long-term Recurrent
Convolutional Networks (LRCN) model is a promising alternative.Comment: 15 page
Automatic annotation of musical audio for interactive applications
PhDAs machines become more and more portable, and part of our everyday life, it becomes
apparent that developing interactive and ubiquitous systems is an important
aspect of new music applications created by the research community. We are interested
in developing a robust layer for the automatic annotation of audio signals, to
be used in various applications, from music search engines to interactive installations,
and in various contexts, from embedded devices to audio content servers. We
propose adaptations of existing signal processing techniques to a real time context.
Amongst these annotation techniques, we concentrate on low and mid-level tasks
such as onset detection, pitch tracking, tempo extraction and note modelling. We
present a framework to extract these annotations and evaluate the performances of
different algorithms.
The first task is to detect onsets and offsets in audio streams within short latencies.
The segmentation of audio streams into temporal objects enables various
manipulation and analysis of metrical structure. Evaluation of different algorithms
and their adaptation to real time are described. We then tackle the problem of
fundamental frequency estimation, again trying to reduce both the delay and the
computational cost. Different algorithms are implemented for real time and experimented
on monophonic recordings and complex signals. Spectral analysis can be
used to label the temporal segments; the estimation of higher level descriptions is
approached. Techniques for modelling of note objects and localisation of beats are
implemented and discussed.
Applications of our framework include live and interactive music installations,
and more generally tools for the composers and sound engineers. Speed optimisations
may bring a significant improvement to various automated tasks, such as
automatic classification and recommendation systems. We describe the design of
our software solution, for our research purposes and in view of its integration within
other systems.EU-FP6-IST-507142 project SIMAC (Semantic Interaction with Music
Audio Contents);
EPSRC grants GR/R54620; GR/S75802/01
Don't hide in the frames: Note- and pattern-based evaluation of automated melody extraction algorithms
International audienc
Automatic music transcription: challenges and future directions
Automatic music transcription is considered by many to be a key enabling technology in music signal processing. However, the performance of transcription systems is still significantly below that of a human expert, and accuracies reported in recent years seem to have reached a limit, although the field is still very active. In this paper we analyse limitations of current methods and identify promising directions for future research. Current transcription methods use general purpose models which are unable to capture the rich diversity found in music signals. One way to overcome the limited performance of transcription systems is to tailor algorithms to specific use-cases. Semi-automatic approaches are another way of achieving a more reliable transcription. Also, the wealth of musical scores and corresponding audio data now available are a rich potential source of training data, via forced alignment of audio to scores, but large scale utilisation of such data has yet to be attempted. Other promising approaches include the integration of information from multiple algorithms and different musical aspects
- …