84 research outputs found
Observations on the dynamic control of an articulatory synthesizer using speech production data
This dissertation explores the automatic generation of gestural score based control structures for a three-dimensional articulatory speech synthesizer. The gestural scores are optimized in an articulatory resynthesis paradigm using a dynamic programming algorithm and a cost function which measures the deviation from a gold standard in the form of natural speech production data. This data had been recorded using electromagnetic articulography, from the same speaker to which the synthesizer\u27s vocal tract model had previously been adapted. Future work to create an English voice for the synthesizer and integrate it into a text-to-speech platform is outlined.Die vorliegende Dissertation untersucht die automatische Erzeugung von gesturalpartiturbasierten Steuerdaten fĂŒr ein dreidimensionales artikulatorisches Sprachsynthesesystem. Die gesturalen Partituren werden in einem artikulatorischen Resynthese-Paradigma mittels dynamischer Programmierung optimiert, unter Zuhilfenahme einer Kostenfunktion, die den Abstand zu einem "Gold Standard" in Form natĂŒrlicher Sprachproduktionsdaten miĂt. Diese Daten waren mit elektromagnetischer Artikulographie am selben Sprecher aufgenommen worden, an den zuvor das Vokaltraktmodell des Synthesesystems angepaĂt worden war. WeiterfĂŒhrende Forschung, eine englische Stimme fĂŒr das Synthesesystem zu erzeugen und sie in eine Text-to-Speech-Plattform einzubetten, wird umrissen
Expression of gender in the human voice: investigating the âgender codeâ
We can easily and reliably identify the gender of an unfamiliar interlocutor over
the telephone. This is because our voice is âsexually dimorphicâ: men typically speak
with a lower fundamental frequency (F0 - lower pitch) and lower vocal tract resonances
(ÎF â âdeeperâ timbre) than women. While the biological bases of these differences are
well understood, and mostly down to size differences between men and women, very
little is known about the extent to which we can play with these differences to
accentuate or de-emphasise our perceived gender, masculinity and femininity in a range
of social roles and contexts.
The general aim of this thesis is to investigate the behavioural basis of gender
expression in the human voice in both children and adults. More specifically, I
hypothesise that, on top of the biologically determined sexual dimorphism, humans use
a âgender codeâ consisting of vocal gestures (global F0 and ÎF adjustments) aimed at
altering the gender attributes conveyed by their voice. In order to test this hypothesis, I
first explore how acoustic variation of sexually dimorphic acoustic cues (F0 and ÎF)
relates to physiological differences in pre-pubertal speakers (vocal tract length) and
adult speakers (body height and salivary testosterone levels), and show that voice
gender variation cannot be solely explained by static, biologically determined
differences in vocal apparatus and body size of speakers. Subsequently, I show that both
children and adult speakers can spontaneously modify their voice gender by lowering
(raising) F0 and ÎF to masculinise (feminise) their voice, a key ability for the
hypothesised control of voice gender. Finally, I investigate the interplay between voice
gender expression and social context in relation to cultural stereotypes. I report that
listeners spontaneously integrate stereotypical information in the auditory and visual
domain to make stereotypical judgments about childrenâs gender and that adult actors
manipulate their gender expression in line with stereotypical gendered notions of
homosexuality. Overall, this corpus of data supports the existence of a âgender codeâ in
human nonverbal vocal communication. This âgender codeâ provides not only a
methodological framework with which to empirically investigate variation in voice
gender and its role in expressing gender identity, but also a unifying theoretical
structure to understand the origins of such variation from both evolutionary and social
perspectives
Warp-Guided GANs for Single-Photo Facial Animation
This paper introduces a novel method for realtime portrait animation in a single photo. Our method requires only a single portrait photo and a set of facial landmarks derived from a driving source (e.g., a photo or a video sequence), and generates an animated image with rich facial details. The core of our method is a warp-guided generative model that instantly fuses various fine facial details (e.g., creases and wrinkles), which are necessary to generate a high-fidelity facial expression, onto a pre-warped image. Our method factorizes out the nonlinear geometric transformations exhibited in facial expressions by lightweight 2D warps and leaves the appearance detail synthesis to conditional generative neural networks for high-fidelity facial animation generation. We show such a factorization of geometric transformation and appearance synthesis largely helps the network better learn the high nonlinearity of the facial expression functions and also facilitates the design of the network architecture. Through extensive experiments on various portrait photos from the Internet, we show the significant efficacy of our method compared with prior arts
Example Based Caricature Synthesis
The likeness of a caricature to the original face image is an essential and often overlooked part of caricature
production. In this paper we present an example based caricature synthesis technique, consisting of shape
exaggeration, relationship exaggeration, and optimization for likeness. Rather than relying on a large training set
of caricature face pairs, our shape exaggeration step is based on only one or a small number of examples of facial
features. The relationship exaggeration step introduces two definitions which facilitate global facial feature
synthesis. The first is the T-Shape rule, which describes the relative relationship between the facial elements in an
intuitive manner. The second is the so called proportions, which characterizes the facial features in a proportion
form. Finally we introduce a similarity metric as the likeness metric based on the Modified Hausdorff Distance
(MHD) which allows us to optimize the configuration of facial elements, maximizing likeness while satisfying a
number of constraints. The effectiveness of our algorithm is demonstrated with experimental results
Models and analysis of vocal emissions for biomedical applications: 5th International Workshop: December 13-15, 2007, Firenze, Italy
The MAVEBA Workshop proceedings, held on a biannual basis, collect the scientific papers presented both as oral and poster contributions, during the conference. The main subjects are: development of theoretical and mechanical models as an aid to the study of main phonatory dysfunctions, as well as the biomedical engineering methods for the analysis of voice signals and images, as a support to clinical diagnosis and classification of vocal pathologies. The Workshop has the sponsorship of: Ente Cassa Risparmio di Firenze, COST Action 2103, Biomedical Signal Processing and Control Journal (Elsevier Eds.), IEEE Biomedical Engineering Soc. Special Issues of International Journals have been, and will be, published, collecting selected papers from the conference
- âŠ