Search CORE

2 research outputs found

Implementation and evaluation of a Spanish TTS based on FastPitch

Author: Manero Alvarez Anne
Publication venue
Publication date: 10/09/2022
Field of study

Text-to-speech (TTS) generates speech from text. This tool helps improve people’s quality of life. However, when extending these models to support languages like Spanish, we find scarce databases, data processing tools, and model training resources. In this thesis, I implemented and evaluated a Spanish TTS model on FastPitch with a 10 hour database. FastPitch is a neural network-based end-to-end TTS system that allows for prosody transformations. I first researched state-of-art TTS and preprocessed the dataset, then implemented and evaluated the model. As a result, several resources are provided: tools for raw database processing, methods for linguistic module adaptation, a clean dataset and a quality TTS system in Spanish. This model’s quality is compared with two vocoders (WaveGlow/HiFiGan) and two other state-of-art acoustic models (FastSpeech2/Tacotron2). The FastPitch model synthesized with HiFiGan vocoder obtained the highest quality results. To conclude, prosody trans- formation experiments at inference resulted successful with this FastPitch Spanish TTS

Archivo Digital para la Docencia y la Investigación

Rapid Generation of Pronunciation Dictionaries for new Domains and Languages

Author: Schlippe Tim
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 01/01/2014
Field of study

This dissertation presents innovative strategies and methods for the rapid generation of pronunciation dictionaries for new domains and languages. Depending on various conditions, solutions are proposed and developed. Starting from the straightforward scenario in which the target language is present in written form on the Internet and the mapping between speech and written language is close up to the difficult scenario in which no written form for the target language exists

KITopen