2 research outputs found
Significance of Maximum Spectral Amplitude in Sub-bands for Spectral Envelope Estimation and Its Application to Statistical Parametric Speech Synthesis
In this paper we propose a technique for spectral envelope estimation using
maximum values in the sub-bands of Fourier magnitude spectrum (MSASB). Most
other methods in the literature parametrize spectral envelope in cepstral
domain such as Mel-generalized cepstrum etc. Such cepstral domain
representations, although compact, are not readily interpretable. This
difficulty is overcome by our method which parametrizes in the spectral domain
itself. In our experiments, spectral envelope estimated using MSASB method was
incorporated in the STRAIGHT vocoder. Both objective and subjective results of
analysis-by-synthesis indicate that the proposed method is comparable to
STRAIGHT. We also evaluate the effectiveness of the proposed parametrization in
a statistical parametric speech synthesis framework using deep neural networks
Fast, Compact, and High Quality LSTM-RNN Based Statistical Parametric Speech Synthesizers for Mobile Devices
Acoustic models based on long short-term memory recurrent neural networks
(LSTM-RNNs) were applied to statistical parametric speech synthesis (SPSS) and
showed significant improvements in naturalness and latency over those based on
hidden Markov models (HMMs). This paper describes further optimizations of
LSTM-RNN-based SPSS for deployment on mobile devices; weight quantization,
multi-frame inference, and robust inference using an {\epsilon}-contaminated
Gaussian loss function. Experimental results in subjective listening tests show
that these optimizations can make LSTM-RNN-based SPSS comparable to HMM-based
SPSS in runtime speed while maintaining naturalness. Evaluations between
LSTM-RNN- based SPSS and HMM-driven unit selection speech synthesis are also
presented.Comment: 13 pages, 3 figures, Interspeech 2016 (accepted