29,918 research outputs found

    An introduction to statistical parametric speech synthesis

    Get PDF

    Recent development of the HMM-based speech synthesis system (HTS)

    Get PDF
    A statistical parametric approach to speech synthesis based on hidden Markov models (HMMs) has grown in popularity over the last few years. In this approach, spectrum, excitation, and duration of speech are simultaneously modeled by context-dependent HMMs, and speech waveforms are generate from the HMMs themselves. Since December 2002, we have publicly released an open-source software toolkit named “HMM-based speech synthesis system (HTS)” to provide a research and development toolkit for statistical parametric speech synthesis. This paper describes recent developments of HTS in detail, as well as future release plans

    Combining vocal tract length normalization with hierarchial linear transformations

    Get PDF
    Recent research has demonstrated the effectiveness of vocal tract length normalization (VTLN) as a rapid adaptation technique for statistical parametric speech synthesis. VTLN produces speech with naturalness preferable to that of MLLR-based adaptation techniques, being much closer in quality to that generated by the original av-erage voice model. However with only a single parameter, VTLN captures very few speaker specific characteristics when compared to linear transform based adaptation techniques. This paper pro-poses that the merits of VTLN can be combined with those of linear transform based adaptation in a hierarchial Bayesian frame-work, where VTLN is used as the prior information. A novel tech-nique for propagating the gender information from the VTLN prior through constrained structural maximum a posteriori linear regres-sion (CSMAPLR) adaptation is presented. Experiments show that the resulting transformation has improved speech quality with better naturalness, intelligibility and improved speaker similarity. Index Terms — Statistical parametric speech synthesis, hidden Markov models, speaker adaptation, vocal tract length normaliza-tion, constrained structural maximum a posteriori linear regression 1

    Statistical parametric speech synthesis for Ibibio

    Get PDF
    Ibibio is a Nigerian tone language, spoken in the south-east coastal region of Nigeria. Like most African languages, it is resource-limited. This presents a major challenge to conventional approaches to speech synthesis, which typically require the training of numerous predictive models of linguistic features such as the phoneme sequence (i.e., a pronunciation dictionary plus a letter-to-sound model) and prosodic structure (e.g., a phrase break predictor). This training is invariably supervised, requiring a corpus of training data labelled with the linguistic feature to be predicted. In this paper, we investigate what can be achieved in the absence of many of these expensive resources, and also with a limited amount of speech recordings. We employ a statistical parametric method, because this has been found to offer good performance even on small corpora, and because it is able to directly learn the relationship between acoustics and whatever linguistic features are available, potentially mitigating the absence of explicit representations of intermediate linguistic layers such as prosody. We present an evaluation that compares systems that have access to varying degrees of linguistic structure. The simplest system only uses phonetic context (quinphones), and this is compared to systems with access to a richer set of context features, with or without tone marking. It is found that the use of tone marking contributes significantly to the quality of synthetic speech. Future work should therefore address the problem of tone assignment using a dictionary and the building of a prediction module for out-of-vocabulary words. Key words: speech synthesis, Ibibio, low-resource languages, HT
    corecore