2,250 research outputs found
A High Quality Text-To-Speech System Composed of Multiple Neural Networks
While neural networks have been employed to handle several different
text-to-speech tasks, ours is the first system to use neural networks
throughout, for both linguistic and acoustic processing. We divide the
text-to-speech task into three subtasks, a linguistic module mapping from text
to a linguistic representation, an acoustic module mapping from the linguistic
representation to speech, and a video module mapping from the linguistic
representation to animated images. The linguistic module employs a
letter-to-sound neural network and a postlexical neural network. The acoustic
module employs a duration neural network and a phonetic neural network. The
visual neural network is employed in parallel to the acoustic module to drive a
talking head. The use of neural networks that can be retrained on the
characteristics of different voices and languages affords our system a degree
of adaptability and naturalness heretofore unavailable.Comment: Source link (9812006.tar.gz) contains: 1 PostScript file (4 pages)
and 3 WAV audio files. If your system does not support Windows WAV files, try
a tool like "sox" to translate the audio into a format of your choic
SMaTTS: standard malay text to speech system
This paper presents a rule-based text- to- speech
(TTS) Synthesis System for Standard Malay, namely SMaTTS. The
proposed system using sinusoidal method and some pre- recorded
wave files in generating speech for the system. The use of phone
database significantly decreases the amount of computer memory
space used, thus making the system very light and embeddable. The
overall system was comprised of two phases the Natural Language
Processing (NLP) that consisted of the high-level processing of text
analysis, phonetic analysis, text normalization and morphophonemic
module. The module was designed specially for SM to overcome
few problems in defining the rules for SM orthography system before
it can be passed to the DSP module. The second phase is the Digital
Signal Processing (DSP) which operated on the low-level process of
the speech waveform generation. A developed an intelligible and
adequately natural sounding formant-based speech synthesis system
with a light and user-friendly Graphical User Interface (GUI) is
introduced. A Standard Malay Language (SM) phoneme set and an
inclusive set of phone database have been constructed carefully for
this phone-based speech synthesizer. By applying the generative
phonology, a comprehensive letter-to-sound (LTS) rules and a
pronunciation lexicon have been invented for SMaTTS. As for the
evaluation tests, a set of Diagnostic Rhyme Test (DRT) word list was
compiled and several experiments have been performed to evaluate
the quality of the synthesized speech by analyzing the Mean Opinion
Score (MOS) obtained. The overall performance of the system as
well as the room for improvements was thoroughly discussed
An articulatory-functional approach to modeling Persian focus prosody
This paper is an attempt to test PENTA, an
articulatory-functional model, on Persian focus
prosody. The test was done on a corpus consisting of
utterances with different focus conditions using
PENTAtrainer2, a trainable prosody synthesizer that
optimizes categorical pitch targets each corresponding
to multiple communicative functions. The
evaluation was done by comparing the F0 contours
generated by the extracted pitch targets to those of
natural utterances through numerical and perceptual
evaluations. The numerical results showed that the
synthesized F0 was close to the natural contour in
terms of RMSE (= 1.94) and Pearsonโs r (= 0.84).
Perceptual evaluation showed that the rate of focus
identification and naturalness judgement by native
Persian listeners were highly similar between
synthetic and natural F0 contours
- โฆ