Search CORE

1,055 research outputs found

Speech Synthesis Based on Hidden Markov Models

Author: Nankaku Y.
Oura K.
Toda T.
Tokuda K.
Yamagishi J.
Zen H.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/05/2013
Field of study

Parallel Reference Speaker Weighting for Kinematic-Independent Acoustic-to-Articulatory Inversion

Author: Berry Jeffrey J.
Ji An
Johnson Michael T.
Publication venue: e-Publications@Marquette
Publication date: 01/10/2016
Field of study

Acoustic-to-articulatory inversion, the estimation of articulatory kinematics from an acoustic waveform, is a challenging but important problem. Accurate estimation of articulatory movements has the potential for significant impact on our understanding of speech production, on our capacity to assess and treat pathologies in a clinical setting, and on speech technologies such as computer aided pronunciation assessment and audio-video synthesis. However, because of the complex and speaker-specific relationship between articulation and acoustics, existing approaches for inversion do not generalize well across speakers. As acquiring speaker-specific kinematic data for training is not feasible in many practical applications, this remains an important and open problem. This paper proposes a novel approach to acoustic-to-articulatory inversion, Parallel Reference Speaker Weighting (PRSW), which requires no kinematic data for the target speaker and a small amount of acoustic adaptation data. PRSW hypothesizes that acoustic and kinematic similarities are correlated and uses speaker-adapted articulatory models derived from acoustically derived weights. The system was assessed using a 20-speaker data set of synchronous acoustic and Electromagnetic Articulography (EMA) kinematic data. Results demonstrate that by restricting the reference group to a subset consisting of speakers with strong individual speaker-dependent inversion performance, the PRSW method is able to attain kinematic-independent acoustic-to-articulatory inversion performance nearly matching that of the speaker-dependent model, with an average correlation of 0.62 versus 0.63. This indicates that given a sufficiently complete and appropriately selected reference speaker set for adaptation, it is possible to create effective articulatory models without kinematic training data

epublications@Marquette

Recent development of the HMM-based speech synthesis system (HTS)

Author: Black Alan W
Masuko Takashi
Nose Takashi
Oura Keiichiro
Sako Shinji
Toda Tomoki
Tokuda Keiichi
Yamagishi Junichi
Zen Heiga
Publication venue
Publication date: 01/01/2009
Field of study

A statistical parametric approach to speech synthesis based on hidden Markov models (HMMs) has grown in popularity over the last few years. In this approach, spectrum, excitation, and duration of speech are simultaneously modeled by context-dependent HMMs, and speech waveforms are generate from the HMMs themselves. Since December 2002, we have publicly released an open-source software toolkit named “HMM-based speech synthesis system (HTS)” to provide a research and development toolkit for statistical parametric speech synthesis. This paper describes recent developments of HTS in detail, as well as future release plans

CiteSeerX

NAIST Academic Repository

Edinburgh Research Archive

Edinburgh Research Explorer

Hokkaido University Collection of Scholarly and Academic Papers

Parametric Human Movements:Learning, Synthesis, Recognition, and Tracking

Author: Herzog Dennis
Publication venue: Aalborg Universitet
Publication date: 01/01/2011
Field of study

VBN