2 research outputs found
Deep Feed-forward Sequential Memory Networks for Speech Synthesis
The Bidirectional LSTM (BLSTM) RNN based speech synthesis system is among the
best parametric Text-to-Speech (TTS) systems in terms of the naturalness of
generated speech, especially the naturalness in prosody. However, the model
complexity and inference cost of BLSTM prevents its usage in many runtime
applications. Meanwhile, Deep Feed-forward Sequential Memory Networks (DFSMN)
has shown its consistent out-performance over BLSTM in both word error rate
(WER) and the runtime computation cost in speech recognition tasks. Since
speech synthesis also requires to model long-term dependencies compared to
speech recognition, in this paper, we investigate the Deep-FSMN (DFSMN) in
speech synthesis. Both objective and subjective experiments show that, compared
with BLSTM TTS method, the DFSMN system can generate synthesized speech with
comparable speech quality while drastically reduce model complexity and speech
generation time.Comment: 5 pages, ICASSP 201
A Survey on Neural Speech Synthesis
Text to speech (TTS), or speech synthesis, which aims to synthesize
intelligible and natural speech given text, is a hot research topic in speech,
language, and machine learning communities and has broad applications in the
industry. As the development of deep learning and artificial intelligence,
neural network-based TTS has significantly improved the quality of synthesized
speech in recent years. In this paper, we conduct a comprehensive survey on
neural TTS, aiming to provide a good understanding of current research and
future trends. We focus on the key components in neural TTS, including text
analysis, acoustic models and vocoders, and several advanced topics, including
fast TTS, low-resource TTS, robust TTS, expressive TTS, and adaptive TTS, etc.
We further summarize resources related to TTS (e.g., datasets, opensource
implementations) and discuss future research directions. This survey can serve
both academic researchers and industry practitioners working on TTS.Comment: A comprehensive survey on TTS, 63 pages, 18 tables, 7 figures, 457
reference