Search CORE

2,250 research outputs found

A High Quality Text-To-Speech System Composed of Multiple Neural Networks

Author: Corrigan Gerald
Karaali Orhan
Mackie Andrew
Massey Noel
Miller Corey
Schnurr Otto
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/1998
Field of study

While neural networks have been employed to handle several different text-to-speech tasks, ours is the first system to use neural networks throughout, for both linguistic and acoustic processing. We divide the text-to-speech task into three subtasks, a linguistic module mapping from text to a linguistic representation, an acoustic module mapping from the linguistic representation to speech, and a video module mapping from the linguistic representation to animated images. The linguistic module employs a letter-to-sound neural network and a postlexical neural network. The acoustic module employs a duration neural network and a phonetic neural network. The visual neural network is employed in parallel to the acoustic module to drive a talking head. The use of neural networks that can be retrained on the characteristics of different voices and languages affords our system a degree of adaptability and naturalness heretofore unavailable.Comment: Source link (9812006.tar.gz) contains: 1 PostScript file (4 pages) and 3 WAV audio files. If your system does not support Windows WAV files, try a tool like "sox" to translate the audio into a format of your choic

arXiv.org e-Print Archive

CiteSeerX

SMaTTS: standard malay text to speech system

Author: Ahmad Zakiah Hanim
Gunawan Teddy Surya
Khalifa Othman Omran
Publication venue: 'International Research Publication House'
Publication date: 01/01/2007
Field of study

This paper presents a rule-based text- to- speech (TTS) Synthesis System for Standard Malay, namely SMaTTS. The proposed system using sinusoidal method and some pre- recorded wave files in generating speech for the system. The use of phone database significantly decreases the amount of computer memory space used, thus making the system very light and embeddable. The overall system was comprised of two phases the Natural Language Processing (NLP) that consisted of the high-level processing of text analysis, phonetic analysis, text normalization and morphophonemic module. The module was designed specially for SM to overcome few problems in defining the rules for SM orthography system before it can be passed to the DSP module. The second phase is the Digital Signal Processing (DSP) which operated on the low-level process of the speech waveform generation. A developed an intelligible and adequately natural sounding formant-based speech synthesis system with a light and user-friendly Graphical User Interface (GUI) is introduced. A Standard Malay Language (SM) phoneme set and an inclusive set of phone database have been constructed carefully for this phone-based speech synthesizer. By applying the generative phonology, a comprehensive letter-to-sound (LTS) rules and a pronunciation lexicon have been invented for SMaTTS. As for the evaluation tests, a set of Diagnostic Rhyme Test (DRT) word list was compiled and several experiments have been performed to evaluate the quality of the synthesized speech by analyzing the Mean Opinion Score (MOS) obtained. The overall performance of the system as well as the room for improvements was thoroughly discussed

CiteSeerX

The International Islamic University Malaysia Repository

An articulatory-functional approach to modeling Persian focus prosody

Author: Taheri-Ardali M
Xu Y
Publication venue: The 18th International Congress of Phonetic Sciences
Publication date: 14/08/2015
Field of study

This paper is an attempt to test PENTA, an articulatory-functional model, on Persian focus prosody. The test was done on a corpus consisting of utterances with different focus conditions using PENTAtrainer2, a trainable prosody synthesizer that optimizes categorical pitch targets each corresponding to multiple communicative functions. The evaluation was done by comparing the F0 contours generated by the extracted pitch targets to those of natural utterances through numerical and perceptual evaluations. The numerical results showed that the synthesized F0 was close to the natural contour in terms of RMSE (= 1.94) and Pearson’s r (= 0.84). Perceptual evaluation showed that the rate of focus identification and naturalness judgement by native Persian listeners were highly similar between synthetic and natural F0 contours

UCL Discovery