3,240 research outputs found
Tacotron: Towards End-to-End Speech Synthesis
A text-to-speech synthesis system typically consists of multiple stages, such
as a text analysis frontend, an acoustic model and an audio synthesis module.
Building these components often requires extensive domain expertise and may
contain brittle design choices. In this paper, we present Tacotron, an
end-to-end generative text-to-speech model that synthesizes speech directly
from characters. Given pairs, the model can be trained completely
from scratch with random initialization. We present several key techniques to
make the sequence-to-sequence framework perform well for this challenging task.
Tacotron achieves a 3.82 subjective 5-scale mean opinion score on US English,
outperforming a production parametric system in terms of naturalness. In
addition, since Tacotron generates speech at the frame level, it's
substantially faster than sample-level autoregressive methods.Comment: Submitted to Interspeech 2017. v2 changed paper title to be
consistent with our conference submission (no content change other than typo
fixes
Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions
This paper describes Tacotron 2, a neural network architecture for speech
synthesis directly from text. The system is composed of a recurrent
sequence-to-sequence feature prediction network that maps character embeddings
to mel-scale spectrograms, followed by a modified WaveNet model acting as a
vocoder to synthesize timedomain waveforms from those spectrograms. Our model
achieves a mean opinion score (MOS) of comparable to a MOS of for
professionally recorded speech. To validate our design choices, we present
ablation studies of key components of our system and evaluate the impact of
using mel spectrograms as the input to WaveNet instead of linguistic, duration,
and features. We further demonstrate that using a compact acoustic
intermediate representation enables significant simplification of the WaveNet
architecture.Comment: Accepted to ICASSP 201
- …