Search CORE

1 research outputs found

Deep learning text-to-speech synthesis with Flowtron and WaveGlow

Author: Sairanen Veera
Publication venue
Publication date: 15/05/2023
Field of study

Innovation in the field of artificial speech synthesis using deep learning has been rapidly increasing over the past years. Current interest lies in the synthesis of speech that is able to model the complex prosody and stylistic features of natural spoken language using a minimal amount of data. Not only are such models remarkable from a technological perspective they also have immense potential as an application of custom voice assistive technology (AT) for people living with speech impairments. However, more research should be focused on the evaluation of the applicability of deep learning text-to-speech (TTS) systems in a real-world context. This thesis aims to further this research by employing two well-known TTS frameworks, Flowtron and WaveGlow, to train a voice clone model on limited personal speech data of a person living with locked in syndrome (LIS). The resulting artificial voice is assessed based on human perception. In addition, the results of the model are showcased in a user-friendly TTS application that also acts as a prototype for custom voice AT. Through the work in this thesis we explore the fascinating world of deep learning based artificial speech synthesis and inspire further research in its relevance toward the development of inclusive technology

Aaltodoc Publication Archive