87,032 research outputs found
On combining the facial movements of a talking head
We present work on Obie, an embodied conversational
agent framework. An embodied conversational agent, or
talking head, consists of three main components. The
graphical part consists of a face model and a facial muscle
model. Besides the graphical part, we have implemented
an emotion model and a mapping from emotions to facial
expressions. The animation part of the framework focuses
on the combination of different facial movements
temporally. In this paper we propose a scheme of
combining facial movements on a 3D talking head
A High Quality Text-To-Speech System Composed of Multiple Neural Networks
While neural networks have been employed to handle several different
text-to-speech tasks, ours is the first system to use neural networks
throughout, for both linguistic and acoustic processing. We divide the
text-to-speech task into three subtasks, a linguistic module mapping from text
to a linguistic representation, an acoustic module mapping from the linguistic
representation to speech, and a video module mapping from the linguistic
representation to animated images. The linguistic module employs a
letter-to-sound neural network and a postlexical neural network. The acoustic
module employs a duration neural network and a phonetic neural network. The
visual neural network is employed in parallel to the acoustic module to drive a
talking head. The use of neural networks that can be retrained on the
characteristics of different voices and languages affords our system a degree
of adaptability and naturalness heretofore unavailable.Comment: Source link (9812006.tar.gz) contains: 1 PostScript file (4 pages)
and 3 WAV audio files. If your system does not support Windows WAV files, try
a tool like "sox" to translate the audio into a format of your choic
Artimate: an articulatory animation framework for audiovisual speech synthesis
We present a modular framework for articulatory animation synthesis using
speech motion capture data obtained with electromagnetic articulography (EMA).
Adapting a skeletal animation approach, the articulatory motion data is applied
to a three-dimensional (3D) model of the vocal tract, creating a portable
resource that can be integrated in an audiovisual (AV) speech synthesis
platform to provide realistic animation of the tongue and teeth for a virtual
character. The framework also provides an interface to articulatory animation
synthesis, as well as an example application to illustrate its use with a 3D
game engine. We rely on cross-platform, open-source software and open standards
to provide a lightweight, accessible, and portable workflow.Comment: Workshop on Innovation and Applications in Speech Technology (2012
Speech-driven Animation with Meaningful Behaviors
Conversational agents (CAs) play an important role in human computer
interaction. Creating believable movements for CAs is challenging, since the
movements have to be meaningful and natural, reflecting the coupling between
gestures and speech. Studies in the past have mainly relied on rule-based or
data-driven approaches. Rule-based methods focus on creating meaningful
behaviors conveying the underlying message, but the gestures cannot be easily
synchronized with speech. Data-driven approaches, especially speech-driven
models, can capture the relationship between speech and gestures. However, they
create behaviors disregarding the meaning of the message. This study proposes
to bridge the gap between these two approaches overcoming their limitations.
The approach builds a dynamic Bayesian network (DBN), where a discrete variable
is added to constrain the behaviors on the underlying constraint. The study
implements and evaluates the approach with two constraints: discourse functions
and prototypical behaviors. By constraining on the discourse functions (e.g.,
questions), the model learns the characteristic behaviors associated with a
given discourse class learning the rules from the data. By constraining on
prototypical behaviors (e.g., head nods), the approach can be embedded in a
rule-based system as a behavior realizer creating trajectories that are timely
synchronized with speech. The study proposes a DBN structure and a training
approach that (1) models the cause-effect relationship between the constraint
and the gestures, (2) initializes the state configuration models increasing the
range of the generated behaviors, and (3) captures the differences in the
behaviors across constraints by enforcing sparse transitions between shared and
exclusive states per constraint. Objective and subjective evaluations
demonstrate the benefits of the proposed approach over an unconstrained model.Comment: 13 pages, 12 figures, 5 table
Recommended from our members
A note on the robust stability of uncertain stochastic fuzzy systems with time-delays
Copyright [2004] IEEE. This material is posted here with permission of the IEEE. Such permission of the IEEE does not in any way imply IEEE endorsement of any of Brunel University's products or services. Internal or personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution must be obtained from the IEEE by writing to [email protected]. By choosing to view this document, you agree to all provisions of the copyright laws protecting it.Takagi-Sugeno (T-S) fuzzy models are now often used to describe complex nonlinear systems in terms of fuzzy sets and fuzzy reasoning applied to a set of linear submodels. In this note, the T-S fuzzy model approach is exploited to establish stability criteria for a class of nonlinear stochastic systems with time delay. Sufficient conditions are derived in the format of linear matrix inequalities (LMIs), such that for all admissible parameter uncertainties, the overall fuzzy system is stochastically exponentially stable in the mean square, independent of the time delay. Therefore, with the numerically attractive Matlab LMI toolbox, the robust stability of the uncertain stochastic fuzzy systems with time delays can be easily checked
Analysis of a Modern Voice Morphing Approach using Gaussian Mixture Models for Laryngectomees
This paper proposes a voice morphing system for people suffering from
Laryngectomy, which is the surgical removal of all or part of the larynx or the
voice box, particularly performed in cases of laryngeal cancer. A primitive
method of achieving voice morphing is by extracting the source's vocal
coefficients and then converting them into the target speaker's vocal
parameters. In this paper, we deploy Gaussian Mixture Models (GMM) for mapping
the coefficients from source to destination. However, the use of the
traditional/conventional GMM-based mapping approach results in the problem of
over-smoothening of the converted voice. Thus, we hereby propose a unique
method to perform efficient voice morphing and conversion based on GMM,which
overcomes the traditional-method effects of over-smoothening. It uses a
technique of glottal waveform separation and prediction of excitations and
hence the result shows that not only over-smoothening is eliminated but also
the transformed vocal tract parameters match with the target. Moreover, the
synthesized speech thus obtained is found to be of a sufficiently high quality.
Thus, voice morphing based on a unique GMM approach has been proposed and also
critically evaluated based on various subjective and objective evaluation
parameters. Further, an application of voice morphing for Laryngectomees which
deploys this unique approach has been recommended by this paper.Comment: 6 pages, 4 figures, 4 tables; International Journal of Computer
Applications Volume 49, Number 21, July 201
- …