End to End Text to Speech Synthesis for Malay Language using Tacotron and Tacotron 2

Abstract

Text-to-speech (TTS) technology is becoming increasingly popular in various fields such as education and business. However, the advancement of TTS technology for Malay language is slower compared to other language especially English language. The rise of artificial intelligence (AI) technology has sparked TTS technology into a new dimension. An end-to-end (E2E) TTS system that generates speech directly from text input is one of the latest AI technologies for TTS and implementing this E2E method into Malay language will help to expand the TTS technology for Malay language. This study involves the development and comparison of two end-to-end TTS models for the Malay language, namely Tacotron and Tacotron 2. Both models were trained using a Malay corpus consisting of text and speech and evaluated the synthesized speech using Mean Opinion Scores (MOS) for naturalness and intelligibility. The results show that Tacotron outperformed Tacotron 2 in terms of naturalness and intelligibility, with both models falling short of human speech quality. Improving TTS technology for Malay can encourage its use in a wider range of contexts

    Similar works

    Full text

    thumbnail-image

    Available Versions