2,487 research outputs found
Singing voice resynthesis using concatenative-based techniques
Tese de Doutoramento. Engenharia Informática. Faculdade de Engenharia. Universidade do Porto. 201
Making music through real-time voice timbre analysis: machine learning and timbral control
PhDPeople can achieve rich musical expression through vocal sound { see for example
human beatboxing, which achieves a wide timbral variety through a range of
extended techniques. Yet the vocal modality is under-exploited as a controller
for music systems. If we can analyse a vocal performance suitably in real time,
then this information could be used to create voice-based interfaces with the
potential for intuitive and ful lling levels of expressive control.
Conversely, many modern techniques for music synthesis do not imply any
particular interface. Should a given parameter be controlled via a MIDI keyboard,
or a slider/fader, or a rotary dial? Automatic vocal analysis could provide
a fruitful basis for expressive interfaces to such electronic musical instruments.
The principal questions in applying vocal-based control are how to extract
musically meaningful information from the voice signal in real time, and how
to convert that information suitably into control data. In this thesis we address
these questions, with a focus on timbral control, and in particular we
develop approaches that can be used with a wide variety of musical instruments
by applying machine learning techniques to automatically derive the mappings
between expressive audio input and control output. The vocal audio signal is
construed to include a broad range of expression, in particular encompassing
the extended techniques used in human beatboxing.
The central contribution of this work is the application of supervised and
unsupervised machine learning techniques to automatically map vocal timbre
to synthesiser timbre and controls. Component contributions include a delayed
decision-making strategy for low-latency sound classi cation, a regression-tree
method to learn associations between regions of two unlabelled datasets, a fast
estimator of multidimensional di erential entropy and a qualitative method for
evaluating musical interfaces based on discourse analysis
The Musical Semiotics of Timbre in the Human VoiceandStatic Takes Love's Body
In exploring the semiotics of vocal timbre as a general phenomenon within music, theoretical engagement of the history of timbre and of muscial meaning bolsters my illustrative analyses of Laurie Anderson and Louis Armstrong. I outline first its reliance on subtractive filtering imparted physically by the performer's vocal tract, demonstrating that its signification is itself a subtractive process where meaning lies in the silent space between spectral formants. Citing Merleau-Ponty's phenomenology and placing the body's perceptual experience as the basis of existential reality, I then argue that the human voice offers self actualization in a way that other sensory categories cannot, because the voice gives us control over what and how we hear in a way that we cannot control, through our own bodies alone, our sight, touch, taste, and smell. This idea combines with a listener's imagined performance of vocal music, in which I propose that because of our familiarity with the articulations of human sound, as we hear a voice we are able to imagine and mimic the choreography of the vocal tract, engaging a physical and bodily listening, thereby making not only performance but also listening a self-affirming bodily reflection on being. Finally I consider vocal timbre as internally lexical and externally bound by a linguistic context. Citing Peirce and Derrida, and incorporating previous points, I show vocal timbre as a canvas on which a linguistic and musical foreground is painted, all interpreted by the body. Accompanying theoretical discussions is a concerto addressing relevant compositional issues
Vocal imitation for query by vocalisation
PhD ThesisThe human voice presents a rich and powerful medium for expressing sonic ideas such as musical sounds. This capability extends beyond the sounds used in speech, evidenced for example in the art form of beatboxing, and recent studies highlighting the utility of vocal imitation for communicating sonic concepts. Meanwhile, the advance of digital audio has resulted in huge libraries of sounds at the disposal of music producers and sound designers. This presents a compelling search problem: with larger search spaces, the task of navigating sound libraries has become increasingly difficult. The versatility and expressive nature of the voice provides a seemingly ideal medium for querying sound libraries, raising the question of how well humans are able to vocally imitate
musical sounds, and how we might use the voice as a tool for search. In this thesis we address these questions by investigating the ability of musicians to
vocalise synthesised and percussive sounds, and evaluate the suitability of different audio features for predicting the perceptual similarity between vocal
imitations and imitated sounds.
In the first experiment, musicians were tasked with imitating synthesised sounds with one or two time–varying feature envelopes applied. The results
show that participants were able to imitate pitch, loudness, and spectral centroid features accurately, and that imitation accuracy was generally preserved
when the imitated stimuli combined two, non-necessarily congruent features. This demonstrates the viability of using the voice as a natural means of
expressing time series of two features simultaneously. The second experiment consisted of two parts. In a vocal production task,
musicians were asked to imitate drum sounds. Listeners were then asked to rate the similarity between the imitations and sounds from the same category
(e.g. kick, snare etc.). The results show that drum sounds received the highest similarity ratings when rated against their imitations (as opposed to imitations of another sound), and overall more than half the imitated sounds were correctly identified with above chance accuracy from the imitations, although
this varied considerably between drum categories.
The findings from the vocal imitation experiments highlight the capacity of musicians to vocally imitate musical sounds, and some limitations of non–
verbal vocal expression. Finally, we investigated the performance of different audio features as predictors of perceptual similarity between the imitations and
imitated sounds from the second experiment. We show that features learned using convolutional auto–encoders outperform a number of popular heuristic
features for this task, and that preservation of temporal information is more important than spectral resolution for differentiating between the vocal imitations and same–category drum sounds
Models and analysis of vocal emissions for biomedical applications
This book of Proceedings collects the papers presented at the 3rd International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications, MAVEBA 2003, held 10-12 December 2003, Firenze, Italy. The workshop is organised every two years, and aims to stimulate contacts between specialists active in research and industrial developments, in the area of voice analysis for biomedical applications. The scope of the Workshop includes all aspects of voice modelling and analysis, ranging from fundamental research to all kinds of biomedical applications and related established and advanced technologies
Two unit selection singing synthesisers
Two speech synthesisers were adapted for singing synthesis using unit selection
techniques provided by the Festival speech synthesis system. A limited domain
approach was used by focussing on the pitch, duration and word of each note.
The first synthesiser used the cluster unit technique on a database of an octave
range, where each note had a specific word assigned to it. Some of the automatic
techniques used (e.g. for segmentation) were designed for speech and should
ideally be adapted to take account of the differences between singing and speaking.
Better quality was achieved with a multisyn engine and improved database design.
This database used a smaller pitch range and only three syllables, ’la’ ’ti’
and ’so’, but each syllable could be synthesised on any available note, and in any
combination of notes and syllables. This was achieved by weighting the target
cost of selecting units from the database in favour of choosing units with the correct
pitch and duration. Finally, prosodic modification was applied to units in
the multisyn engine, but this degraded quality as a result of how the units were
modified.
Although the quality of synthesis was appropriate for the intended applications,
the database was small and linguistic structure simple. To build a larger scale
singing synthesiser, either some aspect of the database should be kept simple,
such as vocabulary, or prosodic modification of units should be improved through
further analysis of the characteristics of singing
Models and Analysis of Vocal Emissions for Biomedical Applications
The MAVEBA Workshop proceedings, held on a biannual basis, collect the scientific papers presented both as oral and poster contributions, during the conference. The main subjects are: development of theoretical and mechanical models as an aid to the study of main phonatory dysfunctions, as well as the biomedical engineering methods for the analysis of voice signals and images, as a support to clinical diagnosis and classification of vocal pathologies
The Race of Sound
In The Race of Sound Nina Sun Eidsheim traces the ways in which sonic attributes that might seem natural, such as the voice and its qualities, are socially produced. Eidsheim illustrates how listeners measure race through sound and locate racial subjectivities in vocal timbre—the color or tone of a voice. Eidsheim examines singers Marian Anderson, Billie Holiday, and Jimmy Scott as well as the vocal synthesis technology Vocaloid to show how listeners carry a series of assumptions about the nature of the voice and to whom it belongs. Outlining how the voice is linked to ideas of racial essentialism and authenticity, Eidsheim untangles the relationship between race, gender, vocal technique, and timbre while addressing an undertheorized space of racial and ethnic performance. In so doing, she advances our knowledge of the cultural-historical formation of the timbral politics of difference and the ways that comprehending voice remains central to understanding human experience, all the while advocating for a form of listening that would allow us to hear singers in a self-reflexive, denaturalized way
Models and Analysis of Vocal Emissions for Biomedical Applications
The International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications (MAVEBA) came into being in 1999 from the particularly felt need of sharing know-how, objectives and results between areas that until then seemed quite distinct such as bioengineering, medicine and singing. MAVEBA deals with all aspects concerning the study of the human voice with applications ranging from the newborn to the adult and elderly. Over the years the initial issues have grown and spread also in other fields of research such as occupational voice disorders, neurology, rehabilitation, image and video analysis. MAVEBA takes place every two years in Firenze, Italy. This edition celebrates twenty-two years of uninterrupted and successful research in the field of voice analysis
- …