2,250 research outputs found

    A High Quality Text-To-Speech System Composed of Multiple Neural Networks

    Full text link
    While neural networks have been employed to handle several different text-to-speech tasks, ours is the first system to use neural networks throughout, for both linguistic and acoustic processing. We divide the text-to-speech task into three subtasks, a linguistic module mapping from text to a linguistic representation, an acoustic module mapping from the linguistic representation to speech, and a video module mapping from the linguistic representation to animated images. The linguistic module employs a letter-to-sound neural network and a postlexical neural network. The acoustic module employs a duration neural network and a phonetic neural network. The visual neural network is employed in parallel to the acoustic module to drive a talking head. The use of neural networks that can be retrained on the characteristics of different voices and languages affords our system a degree of adaptability and naturalness heretofore unavailable.Comment: Source link (9812006.tar.gz) contains: 1 PostScript file (4 pages) and 3 WAV audio files. If your system does not support Windows WAV files, try a tool like "sox" to translate the audio into a format of your choic

    SMaTTS: standard malay text to speech system

    Get PDF
    This paper presents a rule-based text- to- speech (TTS) Synthesis System for Standard Malay, namely SMaTTS. The proposed system using sinusoidal method and some pre- recorded wave files in generating speech for the system. The use of phone database significantly decreases the amount of computer memory space used, thus making the system very light and embeddable. The overall system was comprised of two phases the Natural Language Processing (NLP) that consisted of the high-level processing of text analysis, phonetic analysis, text normalization and morphophonemic module. The module was designed specially for SM to overcome few problems in defining the rules for SM orthography system before it can be passed to the DSP module. The second phase is the Digital Signal Processing (DSP) which operated on the low-level process of the speech waveform generation. A developed an intelligible and adequately natural sounding formant-based speech synthesis system with a light and user-friendly Graphical User Interface (GUI) is introduced. A Standard Malay Language (SM) phoneme set and an inclusive set of phone database have been constructed carefully for this phone-based speech synthesizer. By applying the generative phonology, a comprehensive letter-to-sound (LTS) rules and a pronunciation lexicon have been invented for SMaTTS. As for the evaluation tests, a set of Diagnostic Rhyme Test (DRT) word list was compiled and several experiments have been performed to evaluate the quality of the synthesized speech by analyzing the Mean Opinion Score (MOS) obtained. The overall performance of the system as well as the room for improvements was thoroughly discussed

    An articulatory-functional approach to modeling Persian focus prosody

    Get PDF
    This paper is an attempt to test PENTA, an articulatory-functional model, on Persian focus prosody. The test was done on a corpus consisting of utterances with different focus conditions using PENTAtrainer2, a trainable prosody synthesizer that optimizes categorical pitch targets each corresponding to multiple communicative functions. The evaluation was done by comparing the F0 contours generated by the extracted pitch targets to those of natural utterances through numerical and perceptual evaluations. The numerical results showed that the synthesized F0 was close to the natural contour in terms of RMSE (= 1.94) and Pearsonโ€™s r (= 0.84). Perceptual evaluation showed that the rate of focus identification and naturalness judgement by native Persian listeners were highly similar between synthetic and natural F0 contours
    • โ€ฆ
    corecore