Search CORE

82 research outputs found

Speech Enhancement Using Speech Synthesis Techniques

Author: Maiti Soumi
Publication venue: CUNY Academic Works
Publication date: 01/02/2021
Field of study

Traditional speech enhancement systems reduce noise by modifying the noisy signal to make it more like a clean signal, which suffers from two problems: under-suppression of noise and over-suppression of speech. These problems create distortions in enhanced speech and hurt the quality of the enhanced signal. We propose to utilize speech synthesis techniques for a higher quality speech enhancement system. Synthesizing clean speech based on the noisy signal could produce outputs that are both noise-free and high quality. We first show that we can replace the noisy speech with its clean resynthesis from a previously recorded clean speech dictionary from the same speaker (concatenative resynthesis). Next, we show that using a speech synthesizer (vocoder) we can create a clean resynthesis of the noisy speech for more than one speaker. We term this parametric resynthesis (PR). PR can generate better prosody from noisy speech than a TTS system which uses textual information only. Additionally, we can use the high quality speech generation capability of neural vocoders for better quality speech enhancement. When trained on data from enough speakers, these vocoders can generate speech from unseen speakers, both male, and female, with similar quality as seen speakers in training. Finally, we show that using neural vocoders we can achieve better objective signal and overall quality than the state-of-the-art speech enhancement systems and better subjective quality than an oracle mask-based system

City University of New York

Recommended from our members

Mixed Distance Measures for Optimizing Concatenative Vocabularies for Speech Synthesis: A Thesis Proposal

Author: Polish Nathaniel
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/1987
Field of study

Synthesized speech from text-to-speech systems is generally produced from the concatenation of small units of speech. The concatenation process can be complex, involving smoothing and context dependent adjustments to the speech. The overall quality of the speech produced will depend in large part on the quality of the elements used for concatenation. Selection and evaluation of these elements has been done entirely by hand. The proposed work addresses the process by which these concatenative elements are created from a natural voice and optimized. The optimization uses distance measures which exploit detailed information on the structure of the speech signals

Columbia University Academic Commons

LUKAS - a preliminary report on a new Swedish speech synthesis

Author: Bruce Gösta
Filipsson Marcus
Publication venue
Publication date: 01/01/1997
Field of study

Abstract is not availabl

Lund University Publications

Portfolio of Original Compositions

Author: Ávila Sausor Julián
Publication venue
Publication date: 31/12/2019
Field of study

The University of Manchester - Institutional Repository

Speaker Clustering for Multilingual Synthesis

Author: Black Alan W.
Schultz Tanja
Publication venue
Publication date: 18/06/2008
Field of study

KITopen

Development of a Yoruba Text-to-Speech System Using Festival

Author: Iyanda Abimbola Rhoda
Ninan Olufemi Deborah
Publication venue: The International Institute for Science, Technology and Education (IISTE)
Publication date: 05/08/2017
Field of study

This paper presents a Text-to-Speech (TTS) synthesis system for Yorúbà language using the open-source Festival TTS engine. Yorúbà being a resource scarce language like most African languages however presents a major challenge to conventional speech synthesis approaches, which typically require large corpora for the training of such system. Speech data were recorded in a quiet environment with a noise cancelling microphone on a typical multimedia computer system using the Speech Filing System software (SFS), analysed and annotated using PRAAT speech processing software. Evaluation of the system was done using the intelligibility and naturalness metrics through mean opinion score. The result shows that the level of intelligibility and naturalness of the system on word-level is 55.56% and 50% respectively, but the system performs poorly for both intelligibility and naturalness test on sentence level. Hence, there is a need for further research to improve the quality of the synthesized speech. Keywords: Text-to-Speech, Festival, Yorúbà, Syllabl

International Institute for Science, Technology and Education (IISTE): E-Journals