Search CORE

963 research outputs found

An Autoregressive Recurrent Mixture density Network For Parametric Speech Synthesis

Author: Takaki Shinji
Wang Xin
Yamagishi Junichi
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 19/06/2017
Field of study

Crossref

Edinburgh Research Explorer

Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions

Author: Agiomyrgiannakis Yannis
Chen Zhifeng
Jaitly Navdeep
Pang Ruoming
Saurous Rif A.
Schuster Mike
Shen Jonathan
Skerry-Ryan RJ
Wang Yuxuan
Weiss Ron J.
Wu Yonghui
Yang Zongheng
Zhang Yu
Publication venue
Publication date: 15/02/2018
Field of study

This paper describes Tacotron 2, a neural network architecture for speech synthesis directly from text. The system is composed of a recurrent sequence-to-sequence feature prediction network that maps character embeddings to mel-scale spectrograms, followed by a modified WaveNet model acting as a vocoder to synthesize timedomain waveforms from those spectrograms. Our model achieves a mean opinion score (MOS) of

4.53

comparable to a MOS of

4.58

for professionally recorded speech. To validate our design choices, we present ablation studies of key components of our system and evaluate the impact of using mel spectrograms as the input to WaveNet instead of linguistic, duration, and

F_0

features. We further demonstrate that using a compact acoustic intermediate representation enables significant simplification of the WaveNet architecture.Comment: Accepted to ICASSP 201

arXiv.org e-Print Archive

Crossref

Autoregressive neural F0 model for statistical parametric speech synthesis

Author: Takaki Shinji
Wang Xin
Yamagishi Junichi
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 19/04/2018
Field of study

Crossref

Edinburgh Research Explorer

Generative Image Modeling Using Spatial LSTMs

Author: Bethge Matthias
Theis Lucas
Publication venue
Publication date: 18/09/2015
Field of study

Modeling the distribution of natural images is challenging, partly because of strong statistical dependencies which can extend over hundreds of pixels. Recurrent neural networks have been successful in capturing long-range dependencies in a number of problems but only recently have found their way into generative image models. We here introduce a recurrent image model based on multi-dimensional long short-term memory units which are particularly suited for image modeling due to their spatial structure. Our model scales to images of arbitrary size and its likelihood is computationally tractable. We find that it outperforms the state of the art in quantitative comparisons on several image datasets and produces promising results when used for texture synthesis and inpainting

arXiv.org e-Print Archive

CiteSeerX

MPG.PuRe