Search CORE

237 research outputs found

Using same-language machine translation to create alternative target sequences for text-to-speech synthesis

Author: Cahill Peter
Carson-Berndsen Julie
Du Jinhua
Way Andy
Publication venue
Publication date: 01/01/2009
Field of study

Modern speech synthesis systems attempt to produce speech utterances from an open domain of words. In some situations, the synthesiser will not have the appropriate units to pronounce some words or phrases accurately but it still must attempt to pronounce them. This paper presents a hybrid machine translation and unit selection speech synthesis system. The machine translation system was trained with English as the source and target language. Rather than the synthesiser only saying the input text as would happen in conventional synthesis systems, the synthesiser may say an alternative utterance with the same meaning. This method allows the synthesiser to overcome the problem of insufficient units in runtime

CiteSeerX

Irish Universities

DCU Online Research Access Service

Close Copy Speech Synthesis for Speech Perception Testing

Author: Bachan Jolanta
Gibbon Dafydd
Publication venue: 'Adam Mickiewicz University Poznan'
Publication date: 15/06/2006
Field of study

The present study is concerned with developing a speech synthesis subcomponent for perception testing in the context of evaluating cochlear implants in children. We provide a detailed requirements analysis, and develop a strategy for maximally high quality speech synthesis using Close Copy Speech synthesis techniques with a diphone based speech synthesiser, MBROLA. The close copy concept used in this work defines close copy as a function from a pair of speech signal recording and a phonemic annotation aligned with the recording into the pronunciation specification interface of the speech synthesiser. The design procedure has three phases: Manual Close Copy Speech (MCCS) synthesis as a ?best case gold standard?, in which the function is implemented manually as a preliminary step; Automatic Close Copy Speech (ACCS) synthesis, in which the steps taken in manual transformation are emulated by software; finally, Parametric Close Copy Speech (PCCS) synthesis, in which prosodic parameters are modifiable while retaining the diphones. This contribution reports on the MCCS and ACCS synthesis phases

Biblioteka Nauki - repozytorium artykuÅÃ³w

Investigationes Linguisticae

Generating expressive speech for storytelling applications

Author: Bailly G.
Campbell N.
Hamza W.
Heylen Dirk K.J.
Hoge H.
Jianhua T.
Meijs Koen
Ordelman Roeland J.F.
Theune Mariet
Publication venue: IEEE
Publication date: 01/01/2006
Field of study

Work on expressive speech synthesis has long focused on the expression of basic emotions. In recent years, however, interest in other expressive styles has been increasing. The research presented in this paper aims at the generation of a storytelling speaking style, which is suitable for storytelling applications and more in general, for applications aimed at children. Based on an analysis of human storytellers' speech, we designed and implemented a set of prosodic rules for converting "neutral" speech, as produced by a text-to-speech system, into storytelling speech. An evaluation of our storytelling speech generation system showed encouraging results

University of Twente Research Information

Segmental and prosodic improvements to speech generation

Author: Klabbers E.A.M.
Publication venue: Technische Universiteit Eindhoven
Publication date: 01/01/2000
Field of study

Repository TU/e

Pure OAI Repository

Synchronizing Keyframe Facial Animation to Multiple Text-to-Speech Engines and Natural Voice with Fast Response Time

Author: Pechter William H
Publication venue: Dartmouth Digital Commons
Publication date: 01/05/2004
Field of study

This thesis aims to create an automated lip-synchronization system for real-time applications. Specifically, the system is required to be fast, consist of a limited number of keyframes with small memory requirements, and create fluid and believable animations that synchronize with text-to-speech engines as well as raw voice data. The algorithms utilize traditional keyframe animation and a novel method of keyframe selection. Additionally, phoneme-to-keyframe mapping, synchronization, and simple blending rules are employed. The algorithms provide blending between keyframe images, borrow information from neighboring phonemes, accentuate phonemes b, p and m, differentiate between keyframes for phonemes with allophonic variations, and provide prosodromic variation by including emotion while speaking. The lip-sync animation synchronizes with multiple synthesized voices and human speech. A fast and versatile online real-time java chat interface is created to exhibit vivid facial animation. Results show that the animation algorithms are fast and show accurate lip-synchronization. Additionally, surveys showed that the animations are visually pleasing and improve speech understandability 96% of the time. Applications for this project include internet chat capabilities, interactive teaching of foreign languages, animated news broadcasting, enhanced game technology, and cell phone messaging

Dartmouth Digital Commons (Dartmouth College)

Development of a Yoruba Text-to-Speech System Using Festival

Author: Iyanda Abimbola Rhoda
Ninan Olufemi Deborah
Publication venue: The International Institute for Science, Technology and Education (IISTE)
Publication date: 05/08/2017
Field of study

This paper presents a Text-to-Speech (TTS) synthesis system for Yorúbà language using the open-source Festival TTS engine. Yorúbà being a resource scarce language like most African languages however presents a major challenge to conventional speech synthesis approaches, which typically require large corpora for the training of such system. Speech data were recorded in a quiet environment with a noise cancelling microphone on a typical multimedia computer system using the Speech Filing System software (SFS), analysed and annotated using PRAAT speech processing software. Evaluation of the system was done using the intelligibility and naturalness metrics through mean opinion score. The result shows that the level of intelligibility and naturalness of the system on word-level is 55.56% and 50% respectively, but the system performs poorly for both intelligibility and naturalness test on sentence level. Hence, there is a need for further research to improve the quality of the synthesized speech. Keywords: Text-to-Speech, Festival, Yorúbà, Syllabl

International Institute for Science, Technology and Education (IISTE): E-Journals

Study on phonetic context of Malay syllables towards the development of Malay speech synthesizer [TK7882.S65 H233 2007 f rb].

Author: Samsudin Nur Hana
Publication venue
Publication date: 01/01/2007
Field of study

Pensintesis sebutan Bahasa Melayu telah berkembang daripada teknik pensintesis berparameter (pemodelan penyebutan manusia dan pensintesis berdasarkan formant) kepada teknik pensintesis tidak berparameter (pensintesis sebutan berdasarkan pencantuman). Speech synthesizer has evolved from parametric speech synthesizer (articulatory and formant synthesizer) to non-parametric synthesizer (concatenative synthesizer). Recently, the concatenative speech synthesizer approach is moving towards corpusbased or unit selection technique

Repository@USM

Prosody in text-to-speech synthesis using fuzzy logic

Author: Williams Jonathan Brent
Publication venue: The Research Repository @ WVU
Publication date: 01/12/2005
Field of study

For over a thousand years, inventors, scientists and researchers have tried to reproduce human speech. Today, the quality of synthesized speech is not equivalent to the quality of real speech. Most research on speech synthesis focuses on improving the quality of the speech produced by Text-to-Speech (TTS) systems. The best TTS systems use unit selection-based concatenation to synthesize speech. However, this method is very timely and the speech database is very large. Diphone concatenated synthesized speech requires less memory, but sounds robotic. This thesis explores the use of fuzzy logic to make diphone concatenated speech sound more natural. A TTS is built using both neural networks and fuzzy logic. Text is converted into phonemes using neural networks. Fuzzy logic is used to control the fundamental frequency for three types of sentences. In conclusion, the fuzzy system produces f0 contours that make the diphone concatenated speech sound more natural

The Research Repository @ WVU (West Virginia University)

Multilingual and Multimodal Corpus-Based Text-to-Speech System - PLATTOS -

Author: Izidor Mlakar
Matej Rojc
Publication venue: 'IntechOpen'
Publication date: 21/06/2011
Field of study

IntechOpen

Digital library of University of Maribor