Search CORE

1,100 research outputs found

Mage - Reactive articulatory feature control of HMM-based parametric speech synthesis

Author: Astrinaki Maria
Dutoit Thierry
King Simon
Ling Zhen-Hua
Moinet Alexis
Richmond Korin
Yamagishi Junichi
Publication venue
Publication date: 01/01/2013
Field of study

In this paper, we present the integration of articulatory control into MAGE, a framework for realtime and interactive (reactive) parametric speech synthesis using hidden Markov models (HMMs). MAGE is based on the speech synthesis engine from HTS and uses acoustic features (spectrum and f0) to model and synthesize speech. In this work, we replace the standard acoustic models with models combining acoustic and articulatory features, such as tongue, lips and jaw positions. We then use feature-space-switched articulatory-to-acoustic regression matrices to enable us to control the spectral acoustic features by manipulating the articulatory features. Combining this synthesis model with MAGE allows us to interactively and intuitively modify phones synthesized in real time, for example transforming one phone into another, by controlling the configuration of the articulators in a visual display. Index Terms: speech synthesis, reactive, articulators 1

CiteSeerX

Articulatory features for speech-driven head motion synthesis

Author: Ben Youssef Atef
Braude David A.
Shimodaira Hiroshi
Publication venue
Publication date: 01/08/2013
Field of study

This study investigates the use of articulatory features for speech-driven head motion synthesis as opposed to prosody features such as F0 and energy that have been mainly used in the literature. In the proposed approach, multi-stream HMMs are trained jointly on the synchronous streams of speech and head motion data. Articulatory features can be regarded as an intermediate parametrisation of speech that are expected to have a close link with head movement. Measured head and articulatory movements acquired by EMA were synchronously recorded with speech. Measured articulatory data was compared to those predicted from speech using an HMM-based inversion mapping system trained in a semi-supervised fashion. Canonical correlation analysis (CCA) on a data set of free speech of 12 people shows that the articulatory features are more correlated with head rotation than prosodic and/or cepstral speech features. It is also shown that the synthesised head motion using articulatory features gave higher correlations with the original head motion than when only prosodic features are used. Index Terms: head motion synthesis, articulatory features, canonical correlation analysis, acoustic-to-articulatory mappin

CiteSeerX

Reactive Statistical Mapping: Towards the Sketching of Performative Control with Data

Author: Astrinaki Maria
Babacan Onur
Barbulescu Adela
Cakmak Huseyin
Dall Rasmus
d’Alessandro Nicolas
Hu Qiong
Hueber Thomas
Huguenin Victor
Kalaycı Emine Sümeyye
Moinet Alexis
Parfait Valentin
Ravet Thierry
Tilmanne Joëlle
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 15/07/2013
Field of study

Part 1: Fundamental IssuesInternational audienceThis paper presents the results of our participation to the ninth eNTERFACE workshop on multimodal user interfaces. Our target for this workshop was to bring some technologies currently used in speech recognition and synthesis to a new level, i.e. being the core of a new HMM-based mapping system. The idea of statistical mapping has been investigated, more precisely how to use Gaussian Mixture Models and Hidden Markov Models for realtime and reactive generation of new trajectories from inputted labels and for realtime regression in a continuous-to-continuous use case. As a result, we have developed several proofs of concept, including an incremental speech synthesiser, a software for exploring stylistic spaces for gait and facial motion in realtime, a reactive audiovisual laughter and a prototype demonstrating the realtime reconstruction of lower body gait motion strictly from upper body motion, with conservation of the stylistic properties. This project has been the opportunity to formalise HMM-based mapping, integrate various of these innovations into the Mage library and explore the development of a realtime gesture recognition tool

CiteSeerX

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

An Analysis of HMM-based Prediction of Articulatory Movements

Author: Ling Zhen-Hua
Richmond Korin
Yamagishi Junichi
Publication venue: 'Elsevier BV'
Publication date: 01/10/2010
Field of study

Integrating Articulatory Features into HMM-based Parametric Speech Synthesis

Author: Ling Zhenhua
Richmond Korin
Wang Ren-Hua
Yamagishi Junichi
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2009
Field of study

This paper presents an investigation of ways to integrate articulatory features into Hidden Markov Model (HMM)-based parametric speech synthesis, primarily with the aim of improving the performance of acoustic parameter generation. The joint distribution of acoustic and articulatory features is estimated during training and is then used for parameter generation at synthesis time in conjunction with a maximum-likelihood criterion. Different model structures are explored to allow the articulatory features to influence acoustic modeling: model clustering, state synchrony and cross-stream feature dependency. The results of objective evaluation show that the accuracy of acoustic parameter prediction can be improved when shared clustering and asynchronous-state model structures are adopted for combined acoustic and articulatory features. More significantly, our experiments demonstrate that modeling the dependency between these two feature streams can make speech synthesis more flexible. The characteristics of synthetic speech can be easily controlled by modifying generated articulatory features as part of the process of acoustic parameter generation

CiteSeerX