8 research outputs found
Sampling-based speech parameter generation using moment-matching networks
This paper presents sampling-based speech parameter generation using
moment-matching networks for Deep Neural Network (DNN)-based speech synthesis.
Although people never produce exactly the same speech even if we try to express
the same linguistic and para-linguistic information, typical statistical speech
synthesis produces completely the same speech, i.e., there is no
inter-utterance variation in synthetic speech. To give synthetic speech natural
inter-utterance variation, this paper builds DNN acoustic models that make it
possible to randomly sample speech parameters. The DNNs are trained so that
they make the moments of generated speech parameters close to those of natural
speech parameters. Since the variation of speech parameters is compressed into
a low-dimensional simple prior noise vector, our algorithm has lower
computation cost than direct sampling of speech parameters. As the first step
towards generating synthetic speech that has natural inter-utterance variation,
this paper investigates whether or not the proposed sampling-based generation
deteriorates synthetic speech quality. In evaluation, we compare speech quality
of conventional maximum likelihood-based generation and proposed sampling-based
generation. The result demonstrates the proposed generation causes no
degradation in speech quality.Comment: Submitted to INTERSPEECH 201
ννμμ λλ©μΈμμμ λ³μ‘° μ€ννΈλΌ κΈ°λ° μμ±ν©μ± νμ²λ¦¬
νμλ
Όλ¬Έ (μμ¬)-- μμΈλνκ΅ λνμ 곡과λν μ κΈ°Β·μ 보곡νλΆ, 2017. 8. κΉλ¨μ.This thesis presents a wavelet-domain measure used in postfiltering applications. Quality of HMM-based (hidden Markov model-based) parametric speech synthesis is degraded due to the over-smoothing effect, where the trajectory of generated speech parameters is smoothed out and lacks dynamics. The conventional method uses the modulation spectrum (MS) to quantify the effect of over-smoothing by measuring the spectral tilt in the MS. In order to enhance the performance, a modified version of the MS called the scaled modulation spectrum (SMS), which essentially separates the MS in different bands, is proposed and utilized in postfiltering. The performance of two types of wavelets, the discrete wavelet transform (DWT) and the dual-tree complex wavelet transform (DTCWT), are evaluated. We also extend the SMS into a hidden Markov tree (HMT) model, which represents the interdependencies of the coefficients. Experimental results show that the proposed method performs better.1 Introduction 1
2 Modulation Spectrum-based Post filtering 5
2.1 Modulation Spectrum 5
2.2 Conventional Post filtering 5
3 Discrete Wavelet-based Post filtering 9
3.1 Discrete Wavelet Transform 9
3.2 Post filtering in the Wavelet Domain 10
4 Post filtering Using Dual-tree Complex Wavelet Transforms 13
4.1 Dual-tree Complex Wavelet Transform 13
4.2 Post filtering Using the DTCWT 14
5 Post filtering Using Hidden Markov Tree Models 17
5.1 Statistical Signal Processing Using Hidden Markov Trees 17
5.2 Modeling SMS with HMT 18
6 Experimental Results 23
6.1 Experimental Setup 23
6.2 Results 24
7 Conclusion and Future Work 33
7.1 Conclusion 33
7.2 Future Work 34
Bibliography 35Maste