1 research outputs found
Analysing Shortcomings of Statistical Parametric Speech Synthesis
Output from statistical parametric speech synthesis (SPSS) remains noticeably
worse than natural speech recordings in terms of quality, naturalness, speaker
similarity, and intelligibility in noise. There are many hypotheses regarding
the origins of these shortcomings, but these hypotheses are often kept vague
and presented without empirical evidence that could confirm and quantify how a
specific shortcoming contributes to imperfections in the synthesised speech.
Throughout speech synthesis literature, surprisingly little work is dedicated
towards identifying the perceptually most important problems in speech
synthesis, even though such knowledge would be of great value for creating
better SPSS systems.
In this book chapter, we analyse some of the shortcomings of SPSS. In
particular, we discuss issues with vocoding and present a general methodology
for quantifying the effect of any of the many assumptions and design choices
that hold SPSS back. The methodology is accompanied by an example that
carefully measures and compares the severity of perceptual limitations imposed
by vocoding as well as other factors such as the statistical model and its use.Comment: 34 pages with 4 figures; draft book chapte