unknown

Perceptual optimization of unit-selection text-to-speech synthesis systems by means of active interactive genetic algorithms

Abstract

The tuning process of Unit Selection TTS (US-TTS) system is usually performed by an expert that typically conducts the task of weighting the cost function by hand. However, hand tuning is costly in terms of the required training time and inaccurate and ambiguous in terms of methodology. With the purpose of easing the task of properly tuning the weights of the cost function, this thesis make its contribution from a perceptual-based approach using of active interactive Genetic Algorithms (aiGAs). The thesis pursues four major guidelines: i) accuracy when tuning the weights, ii) robustness of the obtained weights, iii)real world applicability of the methodology to any cost function design, and iv)finding consensus of the different users when tuning the weights. The experimentation is carried out through a small and medium sized corpus (1.9h) applied to different configurations (type of features) of the US-TTS cost function. The thesis concludes that aiGAs are highly competitive in comparison to other weight tuning techniques from the state-of-the-artPeer ReviewedPostprint (published version

    Similar works