1,779 research outputs found
Evaluation and combination of pitch estimation methods for melody extraction in symphonic classical music
The extraction of pitch information is arguably one of the most important
tasks in automatic music description systems. However, previous
research and evaluation datasets dealing with pitch estimation focused
on relatively limited kinds of musical data. This work aims to broaden
this scope by addressing symphonic western classical music recordings,
focusing on pitch estimation for melody extraction. This material is characterised
by a high number of overlapping sources, and by the fact that the
melody may be played by different instrumental sections, often alternating
within an excerpt. We evaluate the performance of eleven state-of-the-art
pitch salience functions, multipitch estimation and melody extraction algorithms
when determining the sequence of pitches corresponding to the
main melody in a varied set of pieces. An important contribution of the
present study is the proposed evaluation framework, including the annotation
methodology, generated dataset and evaluation metrics. The results
show that the assumptions made by certain methods hold better than
others when dealing with this type of music signals, leading to a better
performance. Additionally, we propose a simple method for combining
the output of several algorithms, with promising results
Recommended from our members
Modelling and extraction of fundamental frequency in speech signals
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.One of the most important parameters of speech is the fundamental frequency of vibration of voiced sounds. The audio sensation of the fundamental frequency is known as the pitch. Depending on the tonal/non-tonal category of language, the fundamental frequency conveys intonation, pragmatics and meaning. In addition the fundamental frequency and intonation carry speaker gender, age, identity, speaking style and emotional state. Accurate estimation of the fundamental frequency is critically important for functioning of speech processing applications such as speech coding, speech recognition, speech synthesis and voice morphing. This thesis makes contributions to the development of accurate pitch estimation research in three distinct ways: (1) an investigation of the impact of the window length on pitch estimation error, (2) an investigation of the use of the higher order moments and (3) an investigation of an analysis-synthesis method for selection of the best pitch value among N proposed candidates. Experimental evaluations show that the length of the speech window has a major impact on the accuracy of pitch estimation. Depending on the similarity criteria and the order of the statistical moment a window length of 37 to 80 ms gives the least error. In order to avoid excessive delay as a consequence of using a longer window, a method is proposed
ii where the current short window is concatenated with the previous frames to form a longer signal window for pitch extraction. The use of second order and higher order moments, and the magnitude difference function, as the similarity criteria were explored and compared. A novel method of calculation of moments is introduced where the signal is split, i.e. rectified, into positive and negative valued samples. The moments for the positive and negative parts of the signal are computed separately and combined. The new method of calculation of moments from positive and negative parts and the higher order criteria provide competitive results. A challenging issue in pitch estimation is the determination of the best candidate from N extrema of the similarity criteria. The analysis-synthesis method proposed in this thesis selects the pitch candidate that provides the best reproduction (synthesis) of the harmonic spectrum of the original speech. The synthesis method must be such that the distortion increases with the increasing error in the estimate of the fundamental frequency. To this end a new method of spectral synthesis is proposed using an estimate of the spectral envelop and harmonically spaced asymmetric Gaussian pulses as excitation. The N-best method provides consistent reduction in pitch estimation error. The methods described in this thesis result in a significant improvement in the pitch accuracy and outperform the benchmark YIN method
Audio Indexing Including Frequency Tracking of Simultaneous Multiple Sources in Speech and Music
National audienceIn this paper, we present a complete system for audio indexing. This system is based state-of-the-art methods of Speech-Music-Noise segmentation and Monophonic/Polyphonic estimation. After those methods we propose an original system of superposed sources detection. This approach is based on the analysis of the evolution of the predominant frequencies. In order to validate the whole system we used different corpora : Radio broadcasts, studio music and degraded field records. The first results are encouraging and show the potential of our approach which is generic and can be used on both music and speech contents
Dublin City University video track experiments for TREC 2002
Dublin City University participated in the Feature Extraction task and the Search task of the TREC-2002 Video
Track. In the Feature Extraction task, we submitted 3 features: Face, Speech, and Music. In the Search task, we
developed an interactive video retrieval system, which incorporated the 40 hours of the video search test collection and supported user searching using our own feature extraction data along with the donated feature data and ASR transcript from other Video Track groups. This video retrieval system allows a user to specify a query based on the 10 features and ASR transcript, and the query result is a ranked list of videos that can be further browsed at the shot level. To evaluate the usefulness of the feature-based query, we have developed a second system interface that
provides only ASR transcript-based querying, and we conducted an experiment with 12 test users to compare these 2 systems. Results were submitted to NIST and we are currently conducting further analysis of user performance with these 2 systems
From heuristics-based to data-driven audio melody extraction
The identification of the melody from a music recording is a relatively easy task for humans, but very challenging for computational systems. This task is known as "audio melody extraction", more formally defined as the automatic estimation of the pitch sequence of the melody directly from the audio signal of a polyphonic music recording. This thesis investigates the benefits of exploiting knowledge automatically derived from data for audio melody extraction, by combining digital signal processing and machine learning methods. We extend the scope of melody extraction research by working with a varied dataset and multiple definitions of melody. We first present an overview of the state of the art, and perform an evaluation focused on a novel symphonic music dataset. We then propose melody extraction methods based on a source-filter model and pitch contour characterisation and evaluate them on a wide range of music genres. Finally, we explore novel timbre, tonal and spatial features for contour characterisation, and propose a method for estimating multiple melodic lines. The combination of supervised and unsupervised approaches leads to advancements on melody extraction and shows a promising path for future research and applications
Towards the automated analysis of simple polyphonic music : a knowledge-based approach
PhDMusic understanding is a process closely related to the knowledge and experience
of the listener. The amount of knowledge required is relative to the
complexity of the task in hand.
This dissertation is concerned with the problem of automatically decomposing
musical signals into a score-like representation. It proposes that, as
with humans, an automatic system requires knowledge about the signal and
its expected behaviour to correctly analyse music.
The proposed system uses the blackboard architecture to combine the
use of knowledge with data provided by the bottom-up processing of the
signal's information. Methods are proposed for the estimation of pitches,
onset times and durations of notes in simple polyphonic music.
A method for onset detection is presented. It provides an alternative to
conventional energy-based algorithms by using phase information. Statistical
analysis is used to create a detection function that evaluates the expected
behaviour of the signal regarding onsets.
Two methods for multi-pitch estimation are introduced. The first concentrates
on the grouping of harmonic information in the frequency-domain.
Its performance and limitations emphasise the case for the use of high-level
knowledge.
This knowledge, in the form of the individual waveforms of a single
instrument, is used in the second proposed approach. The method is based
on a time-domain linear additive model and it presents an alternative to
common frequency-domain approaches.
Results are presented and discussed for all methods, showing that, if
reliably generated, the use of knowledge can significantly improve the quality
of the analysis.Joint Information Systems Committee (JISC) in the UK National Science Foundation (N.S.F.) in the United states. Fundacion Gran Mariscal Ayacucho in Venezuela
Super-resolution, Extremal Functions and the Condition Number of Vandermonde Matrices
Super-resolution is a fundamental task in imaging, where the goal is to
extract fine-grained structure from coarse-grained measurements. Here we are
interested in a popular mathematical abstraction of this problem that has been
widely studied in the statistics, signal processing and machine learning
communities. We exactly resolve the threshold at which noisy super-resolution
is possible. In particular, we establish a sharp phase transition for the
relationship between the cutoff frequency () and the separation ().
If , our estimator converges to the true values at an inverse
polynomial rate in terms of the magnitude of the noise. And when no estimator can distinguish between a particular pair of
-separated signals even if the magnitude of the noise is exponentially
small.
Our results involve making novel connections between {\em extremal functions}
and the spectral properties of Vandermonde matrices. We establish a sharp phase
transition for their condition number which in turn allows us to give the first
noise tolerance bounds for the matrix pencil method. Moreover we show that our
methods can be interpreted as giving preconditioners for Vandermonde matrices,
and we use this observation to design faster algorithms for super-resolution.
We believe that these ideas may have other applications in designing faster
algorithms for other basic tasks in signal processing.Comment: 19 page
- …