Search CORE

2 research outputs found

Deep Feed-forward Sequential Memory Networks for Speech Synthesis

Author: Bi Mengxiao
Lei Ming
Lu Heng
Yan Zhijie
Zhang Shiliang
Publication venue
Publication date: 26/02/2018
Field of study

The Bidirectional LSTM (BLSTM) RNN based speech synthesis system is among the best parametric Text-to-Speech (TTS) systems in terms of the naturalness of generated speech, especially the naturalness in prosody. However, the model complexity and inference cost of BLSTM prevents its usage in many runtime applications. Meanwhile, Deep Feed-forward Sequential Memory Networks (DFSMN) has shown its consistent out-performance over BLSTM in both word error rate (WER) and the runtime computation cost in speech recognition tasks. Since speech synthesis also requires to model long-term dependencies compared to speech recognition, in this paper, we investigate the Deep-FSMN (DFSMN) in speech synthesis. Both objective and subjective experiments show that, compared with BLSTM TTS method, the DFSMN system can generate synthesized speech with comparable speech quality while drastically reduce model complexity and speech generation time.Comment: 5 pages, ICASSP 201

arXiv.org e-Print Archive

A Survey on Neural Speech Synthesis

Author: Liu Tie-Yan
Qin Tao
Soong Frank
Tan Xu
Publication venue
Publication date: 23/07/2021
Field of study

Text to speech (TTS), or speech synthesis, which aims to synthesize intelligible and natural speech given text, is a hot research topic in speech, language, and machine learning communities and has broad applications in the industry. As the development of deep learning and artificial intelligence, neural network-based TTS has significantly improved the quality of synthesized speech in recent years. In this paper, we conduct a comprehensive survey on neural TTS, aiming to provide a good understanding of current research and future trends. We focus on the key components in neural TTS, including text analysis, acoustic models and vocoders, and several advanced topics, including fast TTS, low-resource TTS, robust TTS, expressive TTS, and adaptive TTS, etc. We further summarize resources related to TTS (e.g., datasets, opensource implementations) and discuss future research directions. This survey can serve both academic researchers and industry practitioners working on TTS.Comment: A comprehensive survey on TTS, 63 pages, 18 tables, 7 figures, 457 reference

arXiv.org e-Print Archive