182 research outputs found
Multi-Agent Simulation of Emergence of Schwa Deletion Pattern in Hindi
Recently, there has been a revival of interest in multi-agent simulation techniques for exploring the nature of language change. However, a lack of appropriate validation of simulation experiments against real language data often calls into question the general applicability of these methods in modeling realistic language change. We try to address this issue here by making an attempt to model the phenomenon of schwa deletion in Hindi through a multi-agent simulation framework. The pattern of Hindi schwa deletion and its diachronic nature are well studied, not only out of general linguistic inquiry, but also to facilitate Hindi grapheme-to-phoneme conversion, which is a preprocessing step to text-to-speech synthesis. We show that under certain conditions, the schwa deletion pattern observed in modern Hindi emerges in the system from an initial state of no deletion. The simulation framework described in this work can be extended to model other phonological changes as well.Language Change, Linguistic Agent, Language Game, Multi-Agent Simulation, Schwa Deletion
A Review on Multilingual Text to Speech Synthesis by Syllabifying the Words of Devanagari and Roman
Speech synthesis is process of spoken language as an input text and converted into speech waveforms. This paper describes the text to speech system for Devanagari scripted language and Roman Language. There are many earliest TTS systems are available but for Devanagari and Roman scripts are not available
L2-ARCTIC: A Non-Native English Speech Corpus
In this paper, we introduce L2-ARCTIC, a speech corpus of non-native English that is intended for research in voice conversion, accent conversion, and mispronunciation detection. This initial release includes recordings from ten non-native speakers of English whose first languages (L1s) are Hindi, Korean, Mandarin, Spanish, and Arabic, each L1 containing recordings from one male and one female speaker. Each speaker recorded approximately one hour of read speech from the Carnegie Mellon University ARCTIC prompts, from which we generated orthographic and forced-aligned phonetic transcriptions. In addition, we manually annotated 150 utterances per speaker to identify three types of mispronunciation errors: substitutions, deletions, and additions, making it a valuable resource not only for research in voice conversion and accent conversion but also in computer-assisted pronunciation training. The corpus is publicly accessible at https://psi.engr.tamu.edu/l2-arctic-corpus/
Generic Indic Text-to-speech Synthesisers with Rapid Adaptation in an End-to-end Framework
Building text-to-speech (TTS) synthesisers for Indian languages is a
difficult task owing to a large number of active languages. Indian languages
can be classified into a finite set of families, prominent among them,
Indo-Aryan and Dravidian. The proposed work exploits this property to build a
generic TTS system using multiple languages from the same family in an
end-to-end framework. Generic systems are quite robust as they are capable of
capturing a variety of phonotactics across languages. These systems are then
adapted to a new language in the same family using small amounts of adaptation
data. Experiments indicate that good quality TTS systems can be built using
only 7 minutes of adaptation data. An average degradation mean opinion score of
3.98 is obtained for the adapted TTSes.
Extensive analysis of systematic interactions between languages in the
generic TTSes is carried out. x-vectors are included as speaker embedding to
synthesise text in a particular speaker's voice. An interesting observation is
that the prosody of the target speaker's voice is preserved. These results are
quite promising as they indicate the capability of generic TTSes to handle
speaker and language switching seamlessly, along with the ease of adaptation to
a new language
Vowel classification based approach for Telugu Text-to-Speech System using symbol concatenation
Telugu is one of the oldest languages in India. This paper describes the development of Telugu Text-to-Speech System (TTS) using vowel classification. Vowels are most important class of sound in most Indian languages. The duration of vowel is longer than consonants and is most significant. Here vowels are categorized as starting middle and end according to the position of occurrence in a word. The algorithm developed by us involves analysis of a sentence in terms of words and then symbols involving combination of pure consonants and vowels. Wave files are being merged as per the requirement to generate the modified consonants influenced by deergalu (vowel sign) and yuktaksharas generate the speech from a text. Speech unit database consisting of vowels (starting, middle and end) and consonants is developed. We evaluated our TTS using Mean Opinion Score (MOS) for intelligibility and voice quality with and without using vowel classification from sixty five listeners, and got better results with vowel classification
Multi-Speaker Multi-Lingual VQTTS System for LIMMITS 2023 Challenge
In this paper, we describe the systems developed by the SJTU X-LANCE team for
LIMMITS 2023 Challenge, and we mainly focus on the winning system on
naturalness for track 1. The aim of this challenge is to build a multi-speaker
multi-lingual text-to-speech (TTS) system for Marathi, Hindi and Telugu. Each
of the languages has a male and a female speaker in the given dataset. In track
1, only 5 hours data from each speaker can be selected to train the TTS model.
Our system is based on the recently proposed VQTTS that utilizes VQ acoustic
feature rather than mel-spectrogram. We introduce additional speaker embeddings
and language embeddings to VQTTS for controlling the speaker and language
information. In the cross-lingual evaluations where we need to synthesize
speech in a cross-lingual speaker's voice, we provide a native speaker's
embedding to the acoustic model and the target speaker's embedding to the
vocoder. In the subjective MOS listening test on naturalness, our system
achieves 4.77 which ranks first.Comment: Accepted by ICASSP 2023 Special Session for Grand Challenge
- …