Combining vocal tract length normalization with hierarchial linear transformations

Dines, J.; Garner, P.N.; Saheer, L.; Yamagishi, J.

research

Combining vocal tract length normalization with hierarchial linear transformations

Authors: J. Dines
P.N. Garner
L. Saheer
J. Yamagishi
Publication date: 1 January 2012
Publisher: 'Institute of Electrical and Electronics Engineers (IEEE)'
Doi

Abstract

Recent research has demonstrated the effectiveness of vocal tract length normalization (VTLN) as a rapid adaptation technique for statistical parametric speech synthesis. VTLN produces speech with naturalness preferable to that of MLLR-based adaptation techniques, being much closer in quality to that generated by the original av-erage voice model. However with only a single parameter, VTLN captures very few speaker specific characteristics when compared to linear transform based adaptation techniques. This paper pro-poses that the merits of VTLN can be combined with those of linear transform based adaptation in a hierarchial Bayesian frame-work, where VTLN is used as the prior information. A novel tech-nique for propagating the gender information from the VTLN prior through constrained structural maximum a posteriori linear regres-sion (CSMAPLR) adaptation is presented. Experiments show that the resulting transformation has improved speech quality with better naturalness, intelligibility and improved speaker similarity. Index Terms — Statistical parametric speech synthesis, hidden Markov models, speaker adaptation, vocal tract length normaliza-tion, constrained structural maximum a posteriori linear regression 1