Adaptive speech synthesis module with emotional expression

Abstract

Computer generated speech replaces the conventional text based interaction methods. Initially, speech synthesis generated human voice that lacked emotional expression. This kind of speech does not encourage users to interact with computers. Emotional speech synthesis is one of the challenges of speech synthesize research. The quality of emotional speech synthesis is judged by its intelligibility and similarity to natural speech. High quality speech is achievable using the high computational cost unit selection technology. This technology relays on huge sets of recorded speech segments to achieve optimum quality. On the other hand, diphone synthesis technology utilizes computational resources and storage spaces. Its quality is less than unit selection, however, due to the introduction of many digital signal processing algorithms such as the PSOLA algorithm, more natural results was achievable

    Similar works