6,773 research outputs found

    From heuristics-based to data-driven audio melody extraction

    Get PDF
    The identification of the melody from a music recording is a relatively easy task for humans, but very challenging for computational systems. This task is known as "audio melody extraction", more formally defined as the automatic estimation of the pitch sequence of the melody directly from the audio signal of a polyphonic music recording. This thesis investigates the benefits of exploiting knowledge automatically derived from data for audio melody extraction, by combining digital signal processing and machine learning methods. We extend the scope of melody extraction research by working with a varied dataset and multiple definitions of melody. We first present an overview of the state of the art, and perform an evaluation focused on a novel symphonic music dataset. We then propose melody extraction methods based on a source-filter model and pitch contour characterisation and evaluate them on a wide range of music genres. Finally, we explore novel timbre, tonal and spatial features for contour characterisation, and propose a method for estimating multiple melodic lines. The combination of supervised and unsupervised approaches leads to advancements on melody extraction and shows a promising path for future research and applications

    Reconstruction of Phonated Speech from Whispers Using Formant-Derived Plausible Pitch Modulation

    Get PDF
    Whispering is a natural, unphonated, secondary aspect of speech communications for most people. However, it is the primary mechanism of communications for some speakers who have impaired voice production mechanisms, such as partial laryngectomees, as well as for those prescribed voice rest, which often follows surgery or damage to the larynx. Unlike most people, who choose when to whisper and when not to, these speakers may have little choice but to rely on whispers for much of their daily vocal interaction. Even though most speakers will whisper at times, and some speakers can only whisper, the majority of today’s computational speech technology systems assume or require phonated speech. This article considers conversion of whispers into natural-sounding phonated speech as a noninvasive prosthetic aid for people with voice impairments who can only whisper. As a by-product, the technique is also useful for unimpaired speakers who choose to whisper. Speech reconstruction systems can be classified into those requiring training and those that do not. Among the latter, a recent parametric reconstruction framework is explored and then enhanced through a refined estimation of plausible pitch from weighted formant differences. The improved reconstruction framework, with proposed formant-derived artificial pitch modulation, is validated through subjective and objective comparison tests alongside state-of-the-art alternatives

    Agreement among human and annotated transcriptions of global songs

    Get PDF
    Cross-cultural musical analysis requires standardized symbolic representation of sounds such as score notation. However, transcription into notation is usually conducted manually by ear, which is time-consuming and subjective. Our aim is to evaluate the reliability of existing methods for transcribing songs from diverse societies. We had 3 experts independently transcribe a sample of 32 excerpts of traditional monophonic songs from around the world (half a cappella, half with instrumental accompaniment). 16 songs also had pre-existing transcriptions created by 3 different experts. We compared these human transcriptions against one another and against 10 automatic music transcription algorithms. We found that human transcriptions can be sufficiently reliable (~90% agreement, κ ~.7), but current automated methods are not (<60% agreement, κ <.4). No automated method clearly outperformed others, in contrast to our predictions. These results suggest that improving automated methods for cross-cultural music transcription is critical for diversifying MIR

    Aerial Manipulators for Contact-based Interaction

    Get PDF

    Unifying Amplitude and Phase Analysis: A Compositional Data Approach to Functional Multivariate Mixed-Effects Modeling of Mandarin Chinese

    Full text link
    Mandarin Chinese is characterized by being a tonal language; the pitch (or F0F_0) of its utterances carries considerable linguistic information. However, speech samples from different individuals are subject to changes in amplitude and phase which must be accounted for in any analysis which attempts to provide a linguistically meaningful description of the language. A joint model for amplitude, phase and duration is presented which combines elements from Functional Data Analysis, Compositional Data Analysis and Linear Mixed Effects Models. By decomposing functions via a functional principal component analysis, and connecting registration functions to compositional data analysis, a joint multivariate mixed effect model can be formulated which gives insights into the relationship between the different modes of variation as well as their dependence on linguistic and non-linguistic covariates. The model is applied to the COSPRO-1 data set, a comprehensive database of spoken Taiwanese Mandarin, containing approximately 50 thousand phonetically diverse sample F0F_0 contours (syllables), and reveals that phonetic information is jointly carried by both amplitude and phase variation.Comment: 49 pages, 13 figures, small changes to discussio

    Autoregressive neural F0 model for statistical parametric speech synthesis

    Get PDF

    The interaction of helical tip and root vortices in a wind turbine wake

    Get PDF
    Analysis of the helical vortices measured behind a model wind turbine in a water channel are reported. Phase-locked measurements using planar particle image ve- locimetry are taken behind a Glauert rotor to investigate the evolution and breakdown of the helical vortex structures. Existing linear stability theory predicts helical vortex filaments to be susceptible to three unstable modes. The current work presents tip and root vortex evolution in the wake for varying tip speed ratio and shows a breaking of the helical symmetry and merging of the vortices due to mutual inductance between the vortical filaments. The merging of the vortices is shown to be steady with rotor phase, however, small-scale non-periodic meander of the vortex positions is also ob- served. The generation of the helical wake is demonstrated to be closely coupled with the blade aerodynamics, strongly influencing the vortex properties which are shown to agree with theoretical predictions of the circulation shed into the wake by the blades. The mutual inductance of the helices is shown to occur at the same non-dimensional wake distance
    • …
    corecore