Prosody in text-to-speech synthesis using fuzzy logic

Williams, Jonathan Brent

Prosody in text-to-speech synthesis using fuzzy logic

Authors: Jonathan Brent Williams
Publication date: 1 December 2005
Publisher: The Research Repository @ WVU

Abstract

For over a thousand years, inventors, scientists and researchers have tried to reproduce human speech. Today, the quality of synthesized speech is not equivalent to the quality of real speech. Most research on speech synthesis focuses on improving the quality of the speech produced by Text-to-Speech (TTS) systems. The best TTS systems use unit selection-based concatenation to synthesize speech. However, this method is very timely and the speech database is very large. Diphone concatenated synthesized speech requires less memory, but sounds robotic. This thesis explores the use of fuzzy logic to make diphone concatenated speech sound more natural. A TTS is built using both neural networks and fuzzy logic. Text is converted into phonemes using neural networks. Fuzzy logic is used to control the fundamental frequency for three types of sentences. In conclusion, the fuzzy system produces f0 contours that make the diphone concatenated speech sound more natural

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

The Research Repository @ WVU (West Virginia University)

oai:researchrepository.wvu.edu...

Last time updated on 17/10/2019