Search CORE

110 research outputs found

Reducing mismatch in training of DNN-based glottal excitation models in a statistical parametric text-to-speech system

Author: Alku Paavo
Bollepalli Bajibabu
Juvela Lauri
Yamagishi Junichi
Publication venue: 'International Speech Communication Association'
Publication date: 01/08/2017
Field of study

Neural network-based models that generate glottal excitation waveforms from acoustic features have been found to give improved quality in statistical parametric speech synthesis. Until now, however, these models have been trained separately from the acoustic model. This creates mismatch between training and synthesis, as the synthesized acoustic features used for the excitation model input differ from the original inputs, with which the model was trained on. Furthermore, due to the errors in predicting the vocal tract filter, the original excitation waveforms do not provide perfect reconstruction of the speech waveform even if predicted without error. To address these issues and to make the excitation model more robust against errors in acoustic modeling, this paper proposes two modifications to the excitation model training scheme. First, the excitation model is trained in a connected manner, with inputs generated by the acoustic model. Second, the target glottal waveforms are re-estimated by performing glottal inverse filtering with the predicted vocal tract filters. The results show that both of these modifications improve performance measured in MSE and MFCC distortion, and slightly improve the subjective quality of the synthetic speech.Peer reviewe

Crossref

Edinburgh Research Explorer

Aaltodoc Publication Archive

Using Text and Acoustic Features in Predicting Glottal Excitation Waveforms for Parametric Speech Synthesis with Recurrent Neural Networks

Author: Airaksinen Manu
Alku Paavo
Juvela Lauri
Takaki Shinji
Wang Xin
Yamagishi Junichi
Publication venue: 'International Speech Communication Association'
Publication date: 08/09/2016
Field of study

Edinburgh Research Explorer

The NII speech synthesis entry for Blizzard Challenge 2016

Author: Airaksinen Manu
Juvela Lauri
Kim SangJin
Takaki Shinji
Wang Xin
Yamagishi Junichi
Publication venue
Publication date: 16/09/2016
Field of study

Edinburgh Research Explorer

Speech Waveform Synthesis From MFCC Sequences With Generative Adversarial Networks

Author: Airaksinen Manu
Alku Paavo
Bollepalli Bajibabu
Juvela Lauri
Kameoka Hirokazu
Wang Xin
Yamagishi Junichi
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 03/04/2018
Field of study

This paper proposes a method for generating speech from filterbank mel frequency cepstral coefficients (MFCC), which are widely used in speech applications, such as ASR, but are generally considered unusable for speech synthesis. First, we predict fundamental frequency and voicing information from MFCCs with an autoregressive recurrent neural net. Second, the spectral envelope information contained in MFCCs is converted to all-pole filters, and a pitch-synchronous excitation model matched to these filters is trained. Finally, we introduce a generative adversarial network -based noise model to add a realistic high-frequency stochastic component to the modeled excitation signal. The results show that high quality speech reconstruction can be obtained, given only MFCC information at test time

arXiv.org e-Print Archive

Crossref

Edinburgh Research Explorer

Aaltodoc Publication Archive

Towards Parametric Speech Synthesis Using Gaussian-Markov Model of Spectral Envelope and Wavelet-Based Decomposition of F0

Author: Al-Radhi Mohammed Salah
Csapó Tamás Gábor
Németh Géza
Zainkó Csaba
Publication venue
Publication date: 01/01/2022
Field of study

Repository of the Academy's Library