Search CORE

87 research outputs found

Glottal Source and Prosodic Prominence Modelling in HMM-based Speech Synthesis for the Blizzard Challenge 2009

Author: Andersson J. Sebastian
Badino Leonardo
Cabral Joao P
Clark Robert A J
Yamagishi Junichi
Publication venue
Publication date: 01/01/2009
Field of study

This paper describes the CSTR entry for the Blizzard Challenge 2009. The work focused on modifying two parts of the Nitech 2005 HTS speech synthesis system to improve naturalness and contextual appropriateness. The first part incorporated an implementation of the Linjencrants-Fant (LF) glottal source model. The second part focused on improving synthesis of prosodic prominence including emphasis through context dependent phonemes. Emphasis was assigned to the synthesised test sentences based on a handful of theory based rules. The two parts (LF-model and prosodic prominence) were not combined and hence evaluated separately. The results on naturalness for the LF-model showed that it is not yet perceived as natural as the Benchmark HTS system for neutral speech. The results for the prosodic prominence modelling showed that it was perceived as contextually appropriate as the Benchmark HTS system, despite a low naturalness score. The Blizzard challenge evaluation has provided valuable information on the status of our work and continued work will begin with analysing why our modifications resulted in reduced naturalness compared to the Benchmark HTS system

CiteSeerX

Edinburgh Research Archive

Edinburgh Research Explorer

Robust Speaker-Adaptive HMM-based Text-to-Speech Synthesis

Author: Heiga Zen
Junichi Yamagishi
Keiichi Tokuda
Senior Member
Simon King
Steve Renals
Takashi Nose
Tomoki Toda
Zhen-hua Ling
Publication venue
Publication date: 01/01/2009
Field of study

This paper describes a speaker-adaptive HMM-based speech synthesis system. The new system, called ``HTS-2007,'' employs speaker adaptation (CSMAPLR+MAP), feature-space adaptive training, mixed-gender modeling, and full-covariance modeling using CSMAPLR transforms, in addition to several other techniques that have proved effective in our previous systems. Subjective evaluation results show that the new system generates significantly better quality synthetic speech than speaker-dependent approaches with realistic amounts of speech data, and that it bears comparison with speaker-dependent approaches even when large amounts of speech data are available. In addition, a comparison study with several speech synthesis techniques shows the new system is very robust: It is able to build voices from less-than-ideal speech data and synthesize good-quality speech even for out-of-domain sentences

CiteSeerX

Edinburgh Research Archive

Edinburgh Research Explorer

Speaker-Independent HMM-based Speech Synthesis System

Author: Toda Tomoki
Tokuda Keiichi
Yamagishi Junichi
Zen Heiga
Publication venue
Publication date: 01/01/2007
Field of study

This paper describes an HMM-based speech synthesis system developed by the HTS working group for the Blizzard Challenge 2007. To further explore the potential of HMM-based speech synthesis, we incorporate new features in our conventional system which underpin a speaker-independent approach: speaker adaptation techniques; adaptive training for HSMMs; and full covariance modeling using the CSMAPLR transforms

Edinburgh Research Archive

Robust Speaker-Adaptive HMM-Based Text-to-Speech Synthesis

Author: Yamagishi J.
Nose T.
Zen H.
Ling Z. H.
Toda T.
Tokuda K.
King S.
Renals S.
Publication venue
Publication date: 01/08/2009
Field of study

AbstractWe present an algorithm for solving the radiative transfer problem on massively parallel computers using adaptive mesh refinement and domain decomposition. The solver is based on the method of characteristics which requires an adaptive raytracer that integrates the equation of radiative transfer. The radiation field is split into local and global components which are handled separately to overcome the non-locality problem. The solver is implemented in the framework of the magneto-hydrodynamics code FLASH and is coupled by an operator splitting step. The goal is the study of radiation in the context of star formation simulations with a focus on early disc formation and evolution. This requires a proper treatment of radiation physics that covers both the optically thin as well as the optically thick regimes and the transition region in particular. We successfully show the accuracy and feasibility of our method in a series of standard radiative transfer problems and two 3D collapse simulations resembling the early stages of protostar and disc formation

Elsevier - Publisher Connector

Edinburgh Research Explorer

MPG.PuRe

Robust Speaker-Adaptive HMM-Based Text-to-Speech Synthesis

Author: Heiga Zen
Junichi Yamagishi
Keiichi Tokuda
Simon King
Steve Renals
Takashi Nose
Tomoki Toda
Zhen-Hua Ling
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

Improved Average-Voice-based Speech Synthesis Using Gender-Mixed Modeling and a Parameter Generation Algorithm Considering GV

Author: King Simon
Kobayashi Takao
Renals Steve
Toda Tomoki
Tokuda Keiichi
Yamagishi Junichi
Zen Heiga
Publication venue
Publication date: 01/01/2007
Field of study

For constructing a speech synthesis system which can achieve diverse voices, we have been developing a speaker independent approach of HMM-based speech synthesis in which statistical average voice models are adapted to a target speaker using a small amount of speech data. In this paper, we incorporate a high-quality speech vocoding method STRAIGHT and a parameter generation algorithm with global variance into the system for improving quality of synthetic speech. Furthermore, we introduce a feature-space speaker adaptive training algorithm and a gender mixed modeling technique for conducting further normalization of the average voice model. We build an English text-to-speech system using these techniques and show the performance of the system

NAIST Academic Repository

CiteSeerX

Edinburgh Research Archive

Edinburgh Research Explorer

Recent development of the HMM-based speech synthesis system (HTS)

Author: Black Alan W
Masuko Takashi
Nose Takashi
Oura Keiichiro
Sako Shinji
Toda Tomoki
Tokuda Keiichi
Yamagishi Junichi
Zen Heiga
Publication venue
Publication date: 01/01/2009
Field of study

A statistical parametric approach to speech synthesis based on hidden Markov models (HMMs) has grown in popularity over the last few years. In this approach, spectrum, excitation, and duration of speech are simultaneously modeled by context-dependent HMMs, and speech waveforms are generate from the HMMs themselves. Since December 2002, we have publicly released an open-source software toolkit named “HMM-based speech synthesis system (HTS)” to provide a research and development toolkit for statistical parametric speech synthesis. This paper describes recent developments of HTS in detail, as well as future release plans

CiteSeerX

NAIST Academic Repository

Edinburgh Research Archive

Edinburgh Research Explorer

Hokkaido University Collection of Scholarly and Academic Papers

Simple4All proposals for the Albayzin Evaluations in Speech Synthesis

Author: Barra-Chicote Roberto
King Simon
Lorenzo-Trueba Jaime
Montero Juan M
Watts Oliver
Yamagishi Junichi
Publication venue
Publication date: 01/01/2012
Field of study

Edinburgh Research Explorer

Robustness of HMM-based Speech Synthesis

Author: King Simon
Ling Zhenhua
Yamagishi Junichi
Publication venue
Publication date: 01/01/2008
Field of study

As speech synthesis techniques become more advanced, we are able to consider building high-quality voices from data collected outside the usual highly-controlled recording studio environment. This presents new challenges that are not present in conventional text-to-speech synthesis: the available speech data are not perfectly clean, the recording conditions are not consistent, and/or the phonetic balance of the material is not ideal. Although a clear picture of the performance of various speech synthesis techniques (e.g., concatenative, HMM-based or hybrid) under good conditions is provided by the Blizzard Challenge, it is not well understood how robust these algorithms are to less favourable conditions. In this paper, we analyse the performance of several speech synthesis methods under such conditions. This is, as far as we know, a new research topic: ``Robust speech synthesis.'' As a consequence of our investigations, we propose a new robust training method for the HMM-based speech synthesis in for use with speech data collected in unfavourable conditions

CiteSeerX

Edinburgh Research Archive

Edinburgh Research Explorer

Speech Synthesis Based on Hidden Markov Models

Author: Nankaku Y.
Oura K.
Toda T.
Tokuda K.
Yamagishi J.
Zen H.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/05/2013
Field of study

Edinburgh Research Explorer