Computer generated speech replaces the conventional text based interaction methods. Initially, speech
synthesis generated human voice that lacked emotional expression. This kind of speech does not
encourage users to interact with computers. Emotional speech synthesis is one of the challenges of speech
synthesize research. The quality of emotional speech synthesis is judged by its intelligibility and
similarity to natural speech.
High quality speech is achievable using the high computational cost unit selection technology. This
technology relays on huge sets of recorded speech segments to achieve optimum quality. On the other
hand, diphone synthesis technology utilizes computational resources and storage spaces. Its quality is less
than unit selection, however, due to the introduction of many digital signal processing algorithms such as
the PSOLA algorithm, more natural results was achievable