300 research outputs found
Vowel classification based approach for Telugu Text-to-Speech System using symbol concatenation
Telugu is one of the oldest languages in India. This paper describes the development of Telugu Text-to-Speech System (TTS) using vowel classification. Vowels are most important class of sound in most Indian languages. The duration of vowel is longer than consonants and is most significant. Here vowels are categorized as starting middle and end according to the position of occurrence in a word. The algorithm developed by us involves analysis of a sentence in terms of words and then symbols involving combination of pure consonants and vowels. Wave files are being merged as per the requirement to generate the modified consonants influenced by deergalu (vowel sign) and yuktaksharas generate the speech from a text. Speech unit database consisting of vowels (starting, middle and end) and consonants is developed. We evaluated our TTS using Mean Opinion Score (MOS) for intelligibility and voice quality with and without using vowel classification from sixty five listeners, and got better results with vowel classification
Generic Indic Text-to-speech Synthesisers with Rapid Adaptation in an End-to-end Framework
Building text-to-speech (TTS) synthesisers for Indian languages is a
difficult task owing to a large number of active languages. Indian languages
can be classified into a finite set of families, prominent among them,
Indo-Aryan and Dravidian. The proposed work exploits this property to build a
generic TTS system using multiple languages from the same family in an
end-to-end framework. Generic systems are quite robust as they are capable of
capturing a variety of phonotactics across languages. These systems are then
adapted to a new language in the same family using small amounts of adaptation
data. Experiments indicate that good quality TTS systems can be built using
only 7 minutes of adaptation data. An average degradation mean opinion score of
3.98 is obtained for the adapted TTSes.
Extensive analysis of systematic interactions between languages in the
generic TTSes is carried out. x-vectors are included as speaker embedding to
synthesise text in a particular speaker's voice. An interesting observation is
that the prosody of the target speaker's voice is preserved. These results are
quite promising as they indicate the capability of generic TTSes to handle
speaker and language switching seamlessly, along with the ease of adaptation to
a new language
PROSODY PREDICTION FOR TAMIL TEXT-TO-SPEECH SYNTHESIZER USING SENTIMENT ANALYSIS
A speech synthesizer which sounds similar to a human voice is preferred over a robotic voice, and hence to increase the naturalness of a speech synthesizer an efficacious prosody model is imperative. Hence, this paper is focused on developing a prosody prediction model using sentiment analysis for a Tamil speech synthesizer. Two variations of prosody prediction models using SentiWordNet are experimented: one without a stemmer and the other with a stemmer. The prosody prediction model with a stemmer performs much more efficiently than the one without a stemmer as it tackles the highly agglutinative and inflectional words in Tamil language in a better way and is exemplified clearly, in this paper. The performance of the prosody prediction model with a stemmer has a higher classification accuracy of 77% on the test set in comparison to the 57% accuracy by the prosody model without a stemmer.Â
Marathi Speech Synthesis: A Review
This paper seeks to reveal the various aspects of Marathi Speech synthesis. This paper has reviewed research development in the International languages as well as Indian languages and then centering on the development in Marathi languages with regard to other Indian languages. It is anticipated that this work will serve to explore more in Marathi language.
DOI: 10.17762/ijritcc2321-8169.15064
The CSTR System for Multilingual and Code-Switching ASR Challenges for Low Resource Indian Languages
DNN-based Speech Synthesis for Indian Languages from ASCII text
Text-to-Speech synthesis in Indian languages has a seen lot of progress over
the decade partly due to the annual Blizzard challenges. These systems assume
the text to be written in Devanagari or Dravidian scripts which are nearly
phonemic orthography scripts. However, the most common form of computer
interaction among Indians is ASCII written transliterated text. Such text is
generally noisy with many variations in spelling for the same word. In this
paper we evaluate three approaches to synthesize speech from such noisy ASCII
text: a naive Uni-Grapheme approach, a Multi-Grapheme approach, and a
supervised Grapheme-to-Phoneme (G2P) approach. These methods first convert the
ASCII text to a phonetic script, and then learn a Deep Neural Network to
synthesize speech from that. We train and test our models on Blizzard Challenge
datasets that were transliterated to ASCII using crowdsourcing. Our experiments
on Hindi, Tamil and Telugu demonstrate that our models generate speech of
competetive quality from ASCII text compared to the speech synthesized from the
native scripts. All the accompanying transliterated datasets are released for
public access.Comment: 6 pages, 5 figures -- Accepted in 9th ISCA Speech Synthesis Worksho
- …