6 research outputs found

    Improving Neural Vocoder Stability Using Artificial Training Data

    Get PDF
    A text-to-speech (TTS) converter typically comprises a prosodic model that generates acoustic parameters from linguistic features paired with a neural vocoder. With such a configuration, some feature values can be difficult for the neural vocoder to process, resulting in audio artifacts. This disclosure describes techniques to improve neural vocoder performance, e.g., reduce audio artifacts, make the vocoder more robust to unusual acoustic feature variations, generally be more forgiving of errors made by the feature generator, etc. The techniques entail the use of an auxiliary training path that is driven by synthetic training examples generated by CHiVE inference with some random sampling far enough from the mean (zero)

    Wasserstein GAN and Waveform Loss-Based Acoustic Model Training for Multi-Speaker Text-to-Speech Synthesis Systems Using a WaveNet Vocoder

    No full text
    corecore