2,487 research outputs found

    Singing voice resynthesis using concatenative-based techniques

    Get PDF
    Tese de Doutoramento. Engenharia Informática. Faculdade de Engenharia. Universidade do Porto. 201

    Making music through real-time voice timbre analysis: machine learning and timbral control

    Get PDF
    PhDPeople can achieve rich musical expression through vocal sound { see for example human beatboxing, which achieves a wide timbral variety through a range of extended techniques. Yet the vocal modality is under-exploited as a controller for music systems. If we can analyse a vocal performance suitably in real time, then this information could be used to create voice-based interfaces with the potential for intuitive and ful lling levels of expressive control. Conversely, many modern techniques for music synthesis do not imply any particular interface. Should a given parameter be controlled via a MIDI keyboard, or a slider/fader, or a rotary dial? Automatic vocal analysis could provide a fruitful basis for expressive interfaces to such electronic musical instruments. The principal questions in applying vocal-based control are how to extract musically meaningful information from the voice signal in real time, and how to convert that information suitably into control data. In this thesis we address these questions, with a focus on timbral control, and in particular we develop approaches that can be used with a wide variety of musical instruments by applying machine learning techniques to automatically derive the mappings between expressive audio input and control output. The vocal audio signal is construed to include a broad range of expression, in particular encompassing the extended techniques used in human beatboxing. The central contribution of this work is the application of supervised and unsupervised machine learning techniques to automatically map vocal timbre to synthesiser timbre and controls. Component contributions include a delayed decision-making strategy for low-latency sound classi cation, a regression-tree method to learn associations between regions of two unlabelled datasets, a fast estimator of multidimensional di erential entropy and a qualitative method for evaluating musical interfaces based on discourse analysis

    The Musical Semiotics of Timbre in the Human VoiceandStatic Takes Love's Body

    Get PDF
    In exploring the semiotics of vocal timbre as a general phenomenon within music, theoretical engagement of the history of timbre and of muscial meaning bolsters my illustrative analyses of Laurie Anderson and Louis Armstrong. I outline first its reliance on subtractive filtering imparted physically by the performer's vocal tract, demonstrating that its signification is itself a subtractive process where meaning lies in the silent space between spectral formants. Citing Merleau-Ponty's phenomenology and placing the body's perceptual experience as the basis of existential reality, I then argue that the human voice offers self actualization in a way that other sensory categories cannot, because the voice gives us control over what and how we hear in a way that we cannot control, through our own bodies alone, our sight, touch, taste, and smell. This idea combines with a listener's imagined performance of vocal music, in which I propose that because of our familiarity with the articulations of human sound, as we hear a voice we are able to imagine and mimic the choreography of the vocal tract, engaging a physical and bodily listening, thereby making not only performance but also listening a self-affirming bodily reflection on being. Finally I consider vocal timbre as internally lexical and externally bound by a linguistic context. Citing Peirce and Derrida, and incorporating previous points, I show vocal timbre as a canvas on which a linguistic and musical foreground is painted, all interpreted by the body. Accompanying theoretical discussions is a concerto addressing relevant compositional issues

    Vocal imitation for query by vocalisation

    Get PDF
    PhD ThesisThe human voice presents a rich and powerful medium for expressing sonic ideas such as musical sounds. This capability extends beyond the sounds used in speech, evidenced for example in the art form of beatboxing, and recent studies highlighting the utility of vocal imitation for communicating sonic concepts. Meanwhile, the advance of digital audio has resulted in huge libraries of sounds at the disposal of music producers and sound designers. This presents a compelling search problem: with larger search spaces, the task of navigating sound libraries has become increasingly difficult. The versatility and expressive nature of the voice provides a seemingly ideal medium for querying sound libraries, raising the question of how well humans are able to vocally imitate musical sounds, and how we might use the voice as a tool for search. In this thesis we address these questions by investigating the ability of musicians to vocalise synthesised and percussive sounds, and evaluate the suitability of different audio features for predicting the perceptual similarity between vocal imitations and imitated sounds. In the first experiment, musicians were tasked with imitating synthesised sounds with one or two time–varying feature envelopes applied. The results show that participants were able to imitate pitch, loudness, and spectral centroid features accurately, and that imitation accuracy was generally preserved when the imitated stimuli combined two, non-necessarily congruent features. This demonstrates the viability of using the voice as a natural means of expressing time series of two features simultaneously. The second experiment consisted of two parts. In a vocal production task, musicians were asked to imitate drum sounds. Listeners were then asked to rate the similarity between the imitations and sounds from the same category (e.g. kick, snare etc.). The results show that drum sounds received the highest similarity ratings when rated against their imitations (as opposed to imitations of another sound), and overall more than half the imitated sounds were correctly identified with above chance accuracy from the imitations, although this varied considerably between drum categories. The findings from the vocal imitation experiments highlight the capacity of musicians to vocally imitate musical sounds, and some limitations of non– verbal vocal expression. Finally, we investigated the performance of different audio features as predictors of perceptual similarity between the imitations and imitated sounds from the second experiment. We show that features learned using convolutional auto–encoders outperform a number of popular heuristic features for this task, and that preservation of temporal information is more important than spectral resolution for differentiating between the vocal imitations and same–category drum sounds

    Models and analysis of vocal emissions for biomedical applications

    Get PDF
    This book of Proceedings collects the papers presented at the 3rd International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications, MAVEBA 2003, held 10-12 December 2003, Firenze, Italy. The workshop is organised every two years, and aims to stimulate contacts between specialists active in research and industrial developments, in the area of voice analysis for biomedical applications. The scope of the Workshop includes all aspects of voice modelling and analysis, ranging from fundamental research to all kinds of biomedical applications and related established and advanced technologies

    Two unit selection singing synthesisers

    Get PDF
    Two speech synthesisers were adapted for singing synthesis using unit selection techniques provided by the Festival speech synthesis system. A limited domain approach was used by focussing on the pitch, duration and word of each note. The first synthesiser used the cluster unit technique on a database of an octave range, where each note had a specific word assigned to it. Some of the automatic techniques used (e.g. for segmentation) were designed for speech and should ideally be adapted to take account of the differences between singing and speaking. Better quality was achieved with a multisyn engine and improved database design. This database used a smaller pitch range and only three syllables, ’la’ ’ti’ and ’so’, but each syllable could be synthesised on any available note, and in any combination of notes and syllables. This was achieved by weighting the target cost of selecting units from the database in favour of choosing units with the correct pitch and duration. Finally, prosodic modification was applied to units in the multisyn engine, but this degraded quality as a result of how the units were modified. Although the quality of synthesis was appropriate for the intended applications, the database was small and linguistic structure simple. To build a larger scale singing synthesiser, either some aspect of the database should be kept simple, such as vocabulary, or prosodic modification of units should be improved through further analysis of the characteristics of singing

    Models and Analysis of Vocal Emissions for Biomedical Applications

    Get PDF
    The MAVEBA Workshop proceedings, held on a biannual basis, collect the scientific papers presented both as oral and poster contributions, during the conference. The main subjects are: development of theoretical and mechanical models as an aid to the study of main phonatory dysfunctions, as well as the biomedical engineering methods for the analysis of voice signals and images, as a support to clinical diagnosis and classification of vocal pathologies

    The Race of Sound

    Get PDF
    In The Race of Sound Nina Sun Eidsheim traces the ways in which sonic attributes that might seem natural, such as the voice and its qualities, are socially produced. Eidsheim illustrates how listeners measure race through sound and locate racial subjectivities in vocal timbre—the color or tone of a voice. Eidsheim examines singers Marian Anderson, Billie Holiday, and Jimmy Scott as well as the vocal synthesis technology Vocaloid to show how listeners carry a series of assumptions about the nature of the voice and to whom it belongs. Outlining how the voice is linked to ideas of racial essentialism and authenticity, Eidsheim untangles the relationship between race, gender, vocal technique, and timbre while addressing an undertheorized space of racial and ethnic performance. In so doing, she advances our knowledge of the cultural-historical formation of the timbral politics of difference and the ways that comprehending voice remains central to understanding human experience, all the while advocating for a form of listening that would allow us to hear singers in a self-reflexive, denaturalized way

    Models and Analysis of Vocal Emissions for Biomedical Applications

    Get PDF
    The International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications (MAVEBA) came into being in 1999 from the particularly felt need of sharing know-how, objectives and results between areas that until then seemed quite distinct such as bioengineering, medicine and singing. MAVEBA deals with all aspects concerning the study of the human voice with applications ranging from the newborn to the adult and elderly. Over the years the initial issues have grown and spread also in other fields of research such as occupational voice disorders, neurology, rehabilitation, image and video analysis. MAVEBA takes place every two years in Firenze, Italy. This edition celebrates twenty-two years of uninterrupted and successful research in the field of voice analysis
    corecore