Search CORE

6,773 research outputs found

From heuristics-based to data-driven audio melody extraction

Author: Bosch Juan J.
Publication venue
Publication date: 01/01/2017
Field of study

The identification of the melody from a music recording is a relatively easy task for humans, but very challenging for computational systems. This task is known as "audio melody extraction", more formally defined as the automatic estimation of the pitch sequence of the melody directly from the audio signal of a polyphonic music recording. This thesis investigates the benefits of exploiting knowledge automatically derived from data for audio melody extraction, by combining digital signal processing and machine learning methods. We extend the scope of melody extraction research by working with a varied dataset and multiple definitions of melody. We first present an overview of the state of the art, and perform an evaluation focused on a novel symphonic music dataset. We then propose melody extraction methods based on a source-filter model and pitch contour characterisation and evaluate them on a wide range of music genres. Finally, we explore novel timbre, tonal and spatial features for contour characterisation, and propose a method for estimating multiple melodic lines. The combination of supervised and unsupervised approaches leads to advancements on melody extraction and shows a promising path for future research and applications

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

ZENODO

Tesis Doctorals en Xarxa

Reconstruction of Phonated Speech from Whispers Using Formant-Derived Plausible Pitch Modulation

Author: Beigi Homayoon
Hamid Reza Sharifzadeh
Ian V. Mcloughlin
Jingjie Li
Joliveau Elodie
McLoughlin Ian Vince
Netsell Ronald
Rothenberg Martin
Sharifzadeh Hamid Reza
Sharifzadeh Hamid Reza
Su Lim Tan
Sundberg Johan
Toda Tomoki
Yan Song
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 11/05/2015
Field of study

Whispering is a natural, unphonated, secondary aspect of speech communications for most people. However, it is the primary mechanism of communications for some speakers who have impaired voice production mechanisms, such as partial laryngectomees, as well as for those prescribed voice rest, which often follows surgery or damage to the larynx. Unlike most people, who choose when to whisper and when not to, these speakers may have little choice but to rely on whispers for much of their daily vocal interaction. Even though most speakers will whisper at times, and some speakers can only whisper, the majority of today’s computational speech technology systems assume or require phonated speech. This article considers conversion of whispers into natural-sounding phonated speech as a noninvasive prosthetic aid for people with voice impairments who can only whisper. As a by-product, the technique is also useful for unimpaired speakers who choose to whisper. Speech reconstruction systems can be classified into those requiring training and those that do not. Among the latter, a recent parametric reconstruction framework is explored and then enhanced through a refined estimation of plausible pitch from weighted formant differences. The improved reconstruction framework, with proposed formant-derived artificial pitch modulation, is validated through subjective and objective comparison tests alongside state-of-the-art alternatives

Crossref

Kent Academic Repository

Agreement among human and annotated transcriptions of global songs

Author: 22nd International Society for Music Information Retrieval Conference (ISMIR)
Benetos E
Fujii S
Fukatsu H
Kondo H
McBride J
Ozaki Y
Pfordresher PQ
Proutskova P
Sakai E
Savage PE
Six J
T. Tierney A
Publication venue: International Society for Music Information Retrieval
Publication date: 09/11/2021
Field of study

Cross-cultural musical analysis requires standardized symbolic representation of sounds such as score notation. However, transcription into notation is usually conducted manually by ear, which is time-consuming and subjective. Our aim is to evaluate the reliability of existing methods for transcribing songs from diverse societies. We had 3 experts independently transcribe a sample of 32 excerpts of traditional monophonic songs from around the world (half a cappella, half with instrumental accompaniment). 16 songs also had pre-existing transcriptions created by 3 different experts. We compared these human transcriptions against one another and against 10 automatic music transcription algorithms. We found that human transcriptions can be sufficiently reliable (~90% agreement, κ ~.7), but current automated methods are not (<60% agreement, κ <.4). No automated method clearly outperformed others, in contrast to our predictions. These results suggest that improving automated methods for cross-cultural music transcription is critical for diversifying MIR

Queen Mary Research Online

Aerial Manipulators for Contact-based Interaction

Author: Hamaza Salua
Publication venue
Publication date: 28/11/2019
Field of study

Explore Bristol Research

Unifying Amplitude and Phase Analysis: A Compositional Data Approach to Functional Multivariate Mixed-Effects Modeling of Mandarin Chinese

Author: Aston John A. D.
Evans Jonathan P.
Hadjipantelis Pantelis Z.
Müller Hans-Georg
Publication venue
Publication date: 28/12/2014
Field of study

Mandarin Chinese is characterized by being a tonal language; the pitch (or

F_0

) of its utterances carries considerable linguistic information. However, speech samples from different individuals are subject to changes in amplitude and phase which must be accounted for in any analysis which attempts to provide a linguistically meaningful description of the language. A joint model for amplitude, phase and duration is presented which combines elements from Functional Data Analysis, Compositional Data Analysis and Linear Mixed Effects Models. By decomposing functions via a functional principal component analysis, and connecting registration functions to compositional data analysis, a joint multivariate mixed effect model can be formulated which gives insights into the relationship between the different modes of variation as well as their dependence on linguistic and non-linguistic covariates. The model is applied to the COSPRO-1 data set, a comprehensive database of spoken Taiwanese Mandarin, containing approximately 50 thousand phonetically diverse sample

F_0

contours (syllables), and reveals that phonetic information is jointly carried by both amplitude and phase variation.Comment: 49 pages, 13 figures, small changes to discussio

arXiv.org e-Print Archive

FigShare

Autoregressive neural F0 model for statistical parametric speech synthesis

Author: Takaki Shinji
Wang Xin
Yamagishi Junichi
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 19/04/2018
Field of study

Crossref

Edinburgh Research Explorer

The interaction of helical tip and root vortices in a wind turbine wake

Author: Blackburn Hugh M.
Lo Jacono David
Nemes Andras
Sheridan John
Sherry Michael
Publication venue: 'AIP Publishing'
Publication date: 01/01/2013
Field of study

Analysis of the helical vortices measured behind a model wind turbine in a water channel are reported. Phase-locked measurements using planar particle image ve- locimetry are taken behind a Glauert rotor to investigate the evolution and breakdown of the helical vortex structures. Existing linear stability theory predicts helical vortex filaments to be susceptible to three unstable modes. The current work presents tip and root vortex evolution in the wake for varying tip speed ratio and shows a breaking of the helical symmetry and merging of the vortices due to mutual inductance between the vortical filaments. The merging of the vortices is shown to be steady with rotor phase, however, small-scale non-periodic meander of the vortex positions is also ob- served. The generation of the helical wake is demonstrated to be closely coupled with the blade aerodynamics, strongly influencing the vortex properties which are shown to agree with theoretical predictions of the circulation shed into the wake by the blades. The mutual inductance of the helices is shown to occur at the same non-dimensional wake distance

Crossref

Open Archive Toulouse Archive Ouverte