5,520 research outputs found
A High Quality Text-To-Speech System Composed of Multiple Neural Networks
While neural networks have been employed to handle several different
text-to-speech tasks, ours is the first system to use neural networks
throughout, for both linguistic and acoustic processing. We divide the
text-to-speech task into three subtasks, a linguistic module mapping from text
to a linguistic representation, an acoustic module mapping from the linguistic
representation to speech, and a video module mapping from the linguistic
representation to animated images. The linguistic module employs a
letter-to-sound neural network and a postlexical neural network. The acoustic
module employs a duration neural network and a phonetic neural network. The
visual neural network is employed in parallel to the acoustic module to drive a
talking head. The use of neural networks that can be retrained on the
characteristics of different voices and languages affords our system a degree
of adaptability and naturalness heretofore unavailable.Comment: Source link (9812006.tar.gz) contains: 1 PostScript file (4 pages)
and 3 WAV audio files. If your system does not support Windows WAV files, try
a tool like "sox" to translate the audio into a format of your choic
Dynamic Facial Expression Generation on Hilbert Hypersphere with Conditional Wasserstein Generative Adversarial Nets
In this work, we propose a novel approach for generating videos of the six
basic facial expressions given a neutral face image. We propose to exploit the
face geometry by modeling the facial landmarks motion as curves encoded as
points on a hypersphere. By proposing a conditional version of manifold-valued
Wasserstein generative adversarial network (GAN) for motion generation on the
hypersphere, we learn the distribution of facial expression dynamics of
different classes, from which we synthesize new facial expression motions. The
resulting motions can be transformed to sequences of landmarks and then to
images sequences by editing the texture information using another conditional
Generative Adversarial Network. To the best of our knowledge, this is the first
work that explores manifold-valued representations with GAN to address the
problem of dynamic facial expression generation. We evaluate our proposed
approach both quantitatively and qualitatively on two public datasets;
Oulu-CASIA and MUG Facial Expression. Our experimental results demonstrate the
effectiveness of our approach in generating realistic videos with continuous
motion, realistic appearance and identity preservation. We also show the
efficiency of our framework for dynamic facial expressions generation, dynamic
facial expression transfer and data augmentation for training improved emotion
recognition models
Introduction: The Third International Conference on Epigenetic Robotics
This paper summarizes the paper and poster contributions
to the Third International Workshop on
Epigenetic Robotics. The focus of this workshop is
on the cross-disciplinary interaction of developmental
psychology and robotics. Namely, the general
goal in this area is to create robotic models of the
psychological development of various behaviors. The
term "epigenetic" is used in much the same sense as
the term "developmental" and while we could call
our topic "developmental robotics", developmental
robotics can be seen as having a broader interdisciplinary
emphasis. Our focus in this workshop is
on the interaction of developmental psychology and
robotics and we use the phrase "epigenetic robotics"
to capture this focus
- …