Search CORE

8 research outputs found

Sampling-based speech parameter generation using moment-matching networks

Author: Koriyama Tomoki
Saruwatari Hiroshi
Takamichi Shinnosuke
Publication venue
Publication date: 12/04/2017
Field of study

This paper presents sampling-based speech parameter generation using moment-matching networks for Deep Neural Network (DNN)-based speech synthesis. Although people never produce exactly the same speech even if we try to express the same linguistic and para-linguistic information, typical statistical speech synthesis produces completely the same speech, i.e., there is no inter-utterance variation in synthetic speech. To give synthetic speech natural inter-utterance variation, this paper builds DNN acoustic models that make it possible to randomly sample speech parameters. The DNNs are trained so that they make the moments of generated speech parameters close to those of natural speech parameters. Since the variation of speech parameters is compressed into a low-dimensional simple prior noise vector, our algorithm has lower computation cost than direct sampling of speech parameters. As the first step towards generating synthetic speech that has natural inter-utterance variation, this paper investigates whether or not the proposed sampling-based generation deteriorates synthetic speech quality. In evaluation, we compare speech quality of conventional maximum likelihood-based generation and proposed sampling-based generation. The result demonstrates the proposed generation causes no degradation in speech quality.Comment: Submitted to INTERSPEECH 201

arXiv.org e-Print Archive

Crossref

파형요소 도메인에서의 변조 스펙트럼 기반 음성합성 후처리

Author: 장세영
Publication venue: 서울대학교 대학원
Publication date: 01/08/2017
Field of study

학위논문 (석사)-- 서울대학교 대학원 공과대학 전기·정보공학부, 2017. 8. 김남수.This thesis presents a wavelet-domain measure used in postfiltering applications. Quality of HMM-based (hidden Markov model-based) parametric speech synthesis is degraded due to the over-smoothing effect, where the trajectory of generated speech parameters is smoothed out and lacks dynamics. The conventional method uses the modulation spectrum (MS) to quantify the effect of over-smoothing by measuring the spectral tilt in the MS. In order to enhance the performance, a modified version of the MS called the scaled modulation spectrum (SMS), which essentially separates the MS in different bands, is proposed and utilized in postfiltering. The performance of two types of wavelets, the discrete wavelet transform (DWT) and the dual-tree complex wavelet transform (DTCWT), are evaluated. We also extend the SMS into a hidden Markov tree (HMT) model, which represents the interdependencies of the coefficients. Experimental results show that the proposed method performs better.1 Introduction 1 2 Modulation Spectrum-based Post filtering 5 2.1 Modulation Spectrum 5 2.2 Conventional Post filtering 5 3 Discrete Wavelet-based Post filtering 9 3.1 Discrete Wavelet Transform 9 3.2 Post filtering in the Wavelet Domain 10 4 Post filtering Using Dual-tree Complex Wavelet Transforms 13 4.1 Dual-tree Complex Wavelet Transform 13 4.2 Post filtering Using the DTCWT 14 5 Post filtering Using Hidden Markov Tree Models 17 5.1 Statistical Signal Processing Using Hidden Markov Trees 17 5.2 Modeling SMS with HMT 18 6 Experimental Results 23 6.1 Experimental Setup 23 6.2 Results 24 7 Conclusion and Future Work 33 7.1 Conclusion 33 7.2 Future Work 34 Bibliography 35Maste

SNU Open Repository and Archive