25,061 research outputs found
Capture, Learning, and Synthesis of 3D Speaking Styles
Audio-driven 3D facial animation has been widely explored, but achieving
realistic, human-like performance is still unsolved. This is due to the lack of
available 3D datasets, models, and standard evaluation metrics. To address
this, we introduce a unique 4D face dataset with about 29 minutes of 4D scans
captured at 60 fps and synchronized audio from 12 speakers. We then train a
neural network on our dataset that factors identity from facial motion. The
learned model, VOCA (Voice Operated Character Animation) takes any speech
signal as input - even speech in languages other than English - and
realistically animates a wide range of adult faces. Conditioning on subject
labels during training allows the model to learn a variety of realistic
speaking styles. VOCA also provides animator controls to alter speaking style,
identity-dependent facial shape, and pose (i.e. head, jaw, and eyeball
rotations) during animation. To our knowledge, VOCA is the only realistic 3D
facial animation model that is readily applicable to unseen subjects without
retargeting. This makes VOCA suitable for tasks like in-game video, virtual
reality avatars, or any scenario in which the speaker, speech, or language is
not known in advance. We make the dataset and model available for research
purposes at http://voca.is.tue.mpg.de.Comment: To appear in CVPR 201
Nonparallel Emotional Speech Conversion
We propose a nonparallel data-driven emotional speech conversion method. It
enables the transfer of emotion-related characteristics of a speech signal
while preserving the speaker's identity and linguistic content. Most existing
approaches require parallel data and time alignment, which is not available in
most real applications. We achieve nonparallel training based on an
unsupervised style transfer technique, which learns a translation model between
two distributions instead of a deterministic one-to-one mapping between paired
examples. The conversion model consists of an encoder and a decoder for each
emotion domain. We assume that the speech signal can be decomposed into an
emotion-invariant content code and an emotion-related style code in latent
space. Emotion conversion is performed by extracting and recombining the
content code of the source speech and the style code of the target emotion. We
tested our method on a nonparallel corpora with four emotions. Both subjective
and objective evaluations show the effectiveness of our approach.Comment: Published in INTERSPEECH 2019, 5 pages, 6 figures. Simulation
available at http://www.jian-gao.org/emoga
Accurate emotion strength assessment for seen and unseen speech based on data-driven deep learning
Emotion classification of speech and assessment of the emotion strength are required in applications such as emotional text-to-speech and voice conversion. The emotion attribute ranking function based on Support Vector Machine (SVM) was proposed to predict emotion strength for emotional speech corpus. However, the trained ranking function doesn't generalize to new domains, which limits the scope of applications, especially for out-of-domain or unseen speech. In this paper, we propose a data-driven deep learning model, i.e. StrengthNet, to improve the generalization of emotion strength assessment for seen and unseen speech. This is achieved by the fusion of emotional data from various domains. We follow a multi-task learning network architecture that includes an acoustic encoder, a strength predictor, and an auxiliary emotion predictor. Experiments show that the predicted emotion strength of the proposed StrengthNet is highly correlated with ground truth scores for both seen and unseen speech. We release the source codes at: https://github.com/ttslr/StrengthNet
PROMOTING OUTCOME BASED LEARNING (OBL) IN A LINGUISTICS COURSE
Teaching and learning linguistics in higher education is very important especially for English students
because of learning language aspects. Linguistics is a course learnt by English students in Department of
English Education. In the process of teaching and learning linguistics, the lecturers should focus on the
outcome of the learning. They do not only demonstrate how to understand the branches of linguistics such
as morphology, semantics, discourse but they also should be able to make a successful teaching and
learning. One of the ways is by applying Outcome Based Learning (OBL) which is rarely applied. This
approach covers three basic elements: designing the course intended learning outcomes, designing teaching
and learning activities, and designing assessment. That is why the literature study is used to know whether
OBL can be a potential approach in teaching and learning a linguistics course in Department of English
Education. This article focuses on how OBL contributes in the teaching and learning a linguistics course
- …