2 research outputs found

    Implementation and evaluation of a Spanish TTS based on FastPitch

    Get PDF
    Text-to-speech (TTS) generates speech from text. This tool helps improve people’s quality of life. However, when extending these models to support languages like Spanish, we find scarce databases, data processing tools, and model training resources. In this thesis, I implemented and evaluated a Spanish TTS model on FastPitch with a 10 hour database. FastPitch is a neural network-based end-to-end TTS system that allows for prosody transformations. I first researched state-of-art TTS and preprocessed the dataset, then implemented and evaluated the model. As a result, several resources are provided: tools for raw database processing, methods for linguistic module adaptation, a clean dataset and a quality TTS system in Spanish. This model’s quality is compared with two vocoders (WaveGlow/HiFiGan) and two other state-of-art acoustic models (FastSpeech2/Tacotron2). The FastPitch model synthesized with HiFiGan vocoder obtained the highest quality results. To conclude, prosody trans- formation experiments at inference resulted successful with this FastPitch Spanish TTS

    Rapid Generation of Pronunciation Dictionaries for new Domains and Languages

    Get PDF
    This dissertation presents innovative strategies and methods for the rapid generation of pronunciation dictionaries for new domains and languages. Depending on various conditions, solutions are proposed and developed. Starting from the straightforward scenario in which the target language is present in written form on the Internet and the mapping between speech and written language is close up to the difficult scenario in which no written form for the target language exists
    corecore