27 research outputs found

    LSTM Deep Neural Networks Postfiltering for Improving the Quality of Synthetic Voices

    Full text link
    Recent developments in speech synthesis have produced systems capable of outcome intelligible speech, but now researchers strive to create models that more accurately mimic human voices. One such development is the incorporation of multiple linguistic styles in various languages and accents. HMM-based Speech Synthesis is of great interest to many researchers, due to its ability to produce sophisticated features with small footprint. Despite such progress, its quality has not yet reached the level of the predominant unit-selection approaches that choose and concatenate recordings of real speech. Recent efforts have been made in the direction of improving these systems. In this paper we present the application of Long-Short Term Memory Deep Neural Networks as a Postfiltering step of HMM-based speech synthesis, in order to obtain closer spectral characteristics to those of natural speech. The results show how HMM-voices could be improved using this approach.Comment: 5 pages, 5 figure

    Discriminative multi-stream postfilters based on deep learning for enhancing statistical parametric speech synthesis

    Get PDF
    Statistical parametric speech synthesis based on Hidden Markov Models has been an important technique for the production of artificial voices, due to its ability to produce results with high intelligibility and sophisticated features such as voice conversion and accent modification with a small footprint, particularly for low-resource languages where deep learning-based techniques remain unexplored. Despite the progress, the quality of the results, mainly based on Hidden Markov Models (HMM) does not reach those of the predominant approaches, based on unit selection of speech segments of deep learning. One of the proposals to improve the quality of HMM-based speech has been incorporating postfiltering stages, which pretend to increase the quality while preserving the advantages of the process. In this paper, we present a new approach to postfiltering synthesized voices with the application of discriminative postfilters, with several long short-term memory (LSTM) deep neural networks. Our motivation stems from modeling specific mapping from synthesized to natural speech on those segments corresponding to voiced or unvoiced sounds, due to the different qualities of those sounds and how HMM-based voices can present distinct degradation on each one. The paper analyses the discriminative postfilters obtained using five voices, evaluated using three objective measures, Mel cepstral distance and subjective tests. The results indicate the advantages of the discriminative postilters in comparison with the HTS voice and the non-discriminative postfilters.Universidad de Costa Rica/[322-B9-105]/UCR/Costa RicaUCR::Vicerrectoría de Docencia::Ingeniería::Facultad de Ingeniería::Escuela de Ingeniería Eléctric

    Wavenet based low rate speech coding

    Full text link
    Traditional parametric coding of speech facilitates low rate but provides poor reconstruction quality because of the inadequacy of the model used. We describe how a WaveNet generative speech model can be used to generate high quality speech from the bit stream of a standard parametric coder operating at 2.4 kb/s. We compare this parametric coder with a waveform coder based on the same generative model and show that approximating the signal waveform incurs a large rate penalty. Our experiments confirm the high performance of the WaveNet based coder and show that the speech produced by the system is able to additionally perform implicit bandwidth extension and does not significantly impair recognition of the original speaker for the human listener, even when that speaker has not been used during the training of the generative model.Comment: 5 pages, 2 figure

    파형요소 도메인에서의 변조 스펙트럼 기반 음성합성 후처리

    Get PDF
    학위논문 (석사)-- 서울대학교 대학원 공과대학 전기·정보공학부, 2017. 8. 김남수.This thesis presents a wavelet-domain measure used in postfiltering applications. Quality of HMM-based (hidden Markov model-based) parametric speech synthesis is degraded due to the over-smoothing effect, where the trajectory of generated speech parameters is smoothed out and lacks dynamics. The conventional method uses the modulation spectrum (MS) to quantify the effect of over-smoothing by measuring the spectral tilt in the MS. In order to enhance the performance, a modified version of the MS called the scaled modulation spectrum (SMS), which essentially separates the MS in different bands, is proposed and utilized in postfiltering. The performance of two types of wavelets, the discrete wavelet transform (DWT) and the dual-tree complex wavelet transform (DTCWT), are evaluated. We also extend the SMS into a hidden Markov tree (HMT) model, which represents the interdependencies of the coefficients. Experimental results show that the proposed method performs better.1 Introduction 1 2 Modulation Spectrum-based Post filtering 5 2.1 Modulation Spectrum 5 2.2 Conventional Post filtering 5 3 Discrete Wavelet-based Post filtering 9 3.1 Discrete Wavelet Transform 9 3.2 Post filtering in the Wavelet Domain 10 4 Post filtering Using Dual-tree Complex Wavelet Transforms 13 4.1 Dual-tree Complex Wavelet Transform 13 4.2 Post filtering Using the DTCWT 14 5 Post filtering Using Hidden Markov Tree Models 17 5.1 Statistical Signal Processing Using Hidden Markov Trees 17 5.2 Modeling SMS with HMT 18 6 Experimental Results 23 6.1 Experimental Setup 23 6.2 Results 24 7 Conclusion and Future Work 33 7.1 Conclusion 33 7.2 Future Work 34 Bibliography 35Maste
    corecore