Search CORE

1,779 research outputs found

Evaluation and combination of pitch estimation methods for melody extraction in symphonic classical music

Author: Bosch J.J.
Gómez E.
Marxer R.
Publication venue: 'Informa UK Limited'
Publication date: 01/01/2016
Field of study

The extraction of pitch information is arguably one of the most important tasks in automatic music description systems. However, previous research and evaluation datasets dealing with pitch estimation focused on relatively limited kinds of musical data. This work aims to broaden this scope by addressing symphonic western classical music recordings, focusing on pitch estimation for melody extraction. This material is characterised by a high number of overlapping sources, and by the fact that the melody may be played by different instrumental sections, often alternating within an excerpt. We evaluate the performance of eleven state-of-the-art pitch salience functions, multipitch estimation and melody extraction algorithms when determining the sequence of pitches corresponding to the main melody in a varied set of pieces. An important contribution of the present study is the proposed evaluation framework, including the annotation methodology, generated dataset and evaluation metrics. The results show that the assumptions made by certain methods hold better than others when dealing with this type of music signals, leading to a better performance. Additionally, we propose a simple method for combining the output of several algorithms, with promising results

UPF Digital Repository

White Rose Research Online

Pitch modification techniques for sampled voice

Author: Brooks Michael
Publication venue
Publication date: 27/06/2018
Field of study

The Australian National University

Recommended from our members

Modelling and extraction of fundamental frequency in speech signals

Author: Pawi Alipah
Publication venue: Brunel University School of Engineering and Design PhD Theses
Publication date: 01/01/2014
Field of study

This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.One of the most important parameters of speech is the fundamental frequency of vibration of voiced sounds. The audio sensation of the fundamental frequency is known as the pitch. Depending on the tonal/non-tonal category of language, the fundamental frequency conveys intonation, pragmatics and meaning. In addition the fundamental frequency and intonation carry speaker gender, age, identity, speaking style and emotional state. Accurate estimation of the fundamental frequency is critically important for functioning of speech processing applications such as speech coding, speech recognition, speech synthesis and voice morphing. This thesis makes contributions to the development of accurate pitch estimation research in three distinct ways: (1) an investigation of the impact of the window length on pitch estimation error, (2) an investigation of the use of the higher order moments and (3) an investigation of an analysis-synthesis method for selection of the best pitch value among N proposed candidates. Experimental evaluations show that the length of the speech window has a major impact on the accuracy of pitch estimation. Depending on the similarity criteria and the order of the statistical moment a window length of 37 to 80 ms gives the least error. In order to avoid excessive delay as a consequence of using a longer window, a method is proposed ii where the current short window is concatenated with the previous frames to form a longer signal window for pitch extraction. The use of second order and higher order moments, and the magnitude difference function, as the similarity criteria were explored and compared. A novel method of calculation of moments is introduced where the signal is split, i.e. rectified, into positive and negative valued samples. The moments for the positive and negative parts of the signal are computed separately and combined. The new method of calculation of moments from positive and negative parts and the higher order criteria provide competitive results. A challenging issue in pitch estimation is the determination of the best candidate from N extrema of the similarity criteria. The analysis-synthesis method proposed in this thesis selects the pitch candidate that provides the best reproduction (synthesis) of the harmonic spectrum of the original speech. The synthesis method must be such that the distortion increases with the increasing error in the estimate of the fundamental frequency. To this end a new method of spectral synthesis is proposed using an estimate of the spectral envelop and harmonically spaced asymmetric Gaussian pulses as excitation. The N-best method provides consistent reduction in pitch estimation error. The methods described in this thesis result in a significant improvement in the pitch accuracy and outperform the benchmark YIN method

Brunel University Research Archive

Audio Indexing Including Frequency Tracking of Simultaneous Multiple Sources in Speech and Music

Author: André-Obrecht Régine
Le Coz Maxime
Mauclair Julie
Pinquier Julien
Publication venue: HAL CCSD
Publication date: 01/01/2013
Field of study

National audienceIn this paper, we present a complete system for audio indexing. This system is based state-of-the-art methods of Speech-Music-Noise segmentation and Monophonic/Polyphonic estimation. After those methods we propose an original system of superposed sources detection. This approach is based on the analysis of the evolution of the predominant frequencies. In order to validate the whole system we used different corpora : Radio broadcasts, studio music and degraded field records. The first results are encouraging and show the potential of our approach which is generic and can be used on both music and speech contents

Scientific Publications of the University of Toulouse II Le Mirail

Open Archive Toulouse Archive Ouverte

HAL Descartes

Dublin City University video track experiments for TREC 2002

Author: Browne Paul
Czirjék Csaba
Gurrin Cathal
Jarina Roman
Lee Hyowon
Marlow Seán
McDonald Kieran
Murphy Noel
O'Connor Noel E.
Smeaton Alan F.
Ye Jiamin
Publication venue: 'University of Aden - Faculty of Economics and Administration'
Publication date: 01/11/2002
Field of study

Dublin City University participated in the Feature Extraction task and the Search task of the TREC-2002 Video Track. In the Feature Extraction task, we submitted 3 features: Face, Speech, and Music. In the Search task, we developed an interactive video retrieval system, which incorporated the 40 hours of the video search test collection and supported user searching using our own feature extraction data along with the donated feature data and ASR transcript from other Video Track groups. This video retrieval system allows a user to specify a query based on the 10 features and ASR transcript, and the query result is a ranked list of videos that can be further browsed at the shot level. To evaluate the usefulness of the feature-based query, we have developed a second system interface that provides only ASR transcript-based querying, and we conducted an experiment with 12 test users to compare these 2 systems. Results were submitted to NIST and we are currently conducting further analysis of user performance with these 2 systems

DCU Online Research Access Service

From heuristics-based to data-driven audio melody extraction

Author: Bosch Juan J.
Publication venue
Publication date: 01/01/2017
Field of study

The identification of the melody from a music recording is a relatively easy task for humans, but very challenging for computational systems. This task is known as "audio melody extraction", more formally defined as the automatic estimation of the pitch sequence of the melody directly from the audio signal of a polyphonic music recording. This thesis investigates the benefits of exploiting knowledge automatically derived from data for audio melody extraction, by combining digital signal processing and machine learning methods. We extend the scope of melody extraction research by working with a varied dataset and multiple definitions of melody. We first present an overview of the state of the art, and perform an evaluation focused on a novel symphonic music dataset. We then propose melody extraction methods based on a source-filter model and pitch contour characterisation and evaluate them on a wide range of music genres. Finally, we explore novel timbre, tonal and spatial features for contour characterisation, and propose a method for estimating multiple melodic lines. The combination of supervised and unsupervised approaches leads to advancements on melody extraction and shows a promising path for future research and applications

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

ZENODO

Tesis Doctorals en Xarxa

Towards the automated analysis of simple polyphonic music : a knowledge-based approach

Author: Bello Correa Juan Pablo
Publication venue: 'Queen Mary University of London'
Publication date: 01/01/2003
Field of study

PhDMusic understanding is a process closely related to the knowledge and experience of the listener. The amount of knowledge required is relative to the complexity of the task in hand. This dissertation is concerned with the problem of automatically decomposing musical signals into a score-like representation. It proposes that, as with humans, an automatic system requires knowledge about the signal and its expected behaviour to correctly analyse music. The proposed system uses the blackboard architecture to combine the use of knowledge with data provided by the bottom-up processing of the signal's information. Methods are proposed for the estimation of pitches, onset times and durations of notes in simple polyphonic music. A method for onset detection is presented. It provides an alternative to conventional energy-based algorithms by using phase information. Statistical analysis is used to create a detection function that evaluates the expected behaviour of the signal regarding onsets. Two methods for multi-pitch estimation are introduced. The first concentrates on the grouping of harmonic information in the frequency-domain. Its performance and limitations emphasise the case for the use of high-level knowledge. This knowledge, in the form of the individual waveforms of a single instrument, is used in the second proposed approach. The method is based on a time-domain linear additive model and it presents an alternative to common frequency-domain approaches. Results are presented and discussed for all methods, showing that, if reliably generated, the use of knowledge can significantly improve the quality of the analysis.Joint Information Systems Committee (JISC) in the UK National Science Foundation (N.S.F.) in the United states. Fundacion Gran Mariscal Ayacucho in Venezuela

Queen Mary Research Online

Automated measures of dysphonias and the phonatory effects of asymmetries in the posterior larynx

Author: Vieira Maurilio Nunes
Publication venue: The University of Edinburgh
Publication date: 01/01/1997
Field of study

Edinburgh Research Archive

Super-resolution, Extremal Functions and the Condition Number of Vandermonde Matrices

Author: Beurling A.
Chandrasekaran V.
Cheney E.
Do Ba K.
Fernandez-Granda C.
Horn R.
Hua Y.
Montgomery H.
Park S.
Stewart G.
Timan A.
Publication venue
Publication date: 01/04/2015
Field of study

Super-resolution is a fundamental task in imaging, where the goal is to extract fine-grained structure from coarse-grained measurements. Here we are interested in a popular mathematical abstraction of this problem that has been widely studied in the statistics, signal processing and machine learning communities. We exactly resolve the threshold at which noisy super-resolution is possible. In particular, we establish a sharp phase transition for the relationship between the cutoff frequency (

m

) and the separation (

\Delta

). If

m > 1/\Delta + 1

, our estimator converges to the true values at an inverse polynomial rate in terms of the magnitude of the noise. And when

m < (1-\epsilon) /\Delta

no estimator can distinguish between a particular pair of

\Delta

-separated signals even if the magnitude of the noise is exponentially small. Our results involve making novel connections between {\em extremal functions} and the spectral properties of Vandermonde matrices. We establish a sharp phase transition for their condition number which in turn allows us to give the first noise tolerance bounds for the matrix pencil method. Moreover we show that our methods can be interpreted as giving preconditioners for Vandermonde matrices, and we use this observation to design faster algorithms for super-resolution. We believe that these ideas may have other applications in designing faster algorithms for other basic tasks in signal processing.Comment: 19 page

arXiv.org e-Print Archive

DSpace@MIT

Crossref