23 research outputs found

    Marathi Speech Synthesis: A Review

    Get PDF
    This paper seeks to reveal the various aspects of Marathi Speech synthesis. This paper has reviewed research development in the International languages as well as Indian languages and then centering on the development in Marathi languages with regard to other Indian languages. It is anticipated that this work will serve to explore more in Marathi language. DOI: 10.17762/ijritcc2321-8169.15064

    A Review on Multilingual Text to Speech Synthesis by Syllabifying the Words of Devanagari and Roman

    Get PDF
    Speech synthesis is process of spoken language as an input text and converted into speech waveforms. This paper describes the text to speech system for Devanagari scripted language and Roman Language. There are many earliest TTS systems are available but for Devanagari and Roman scripts are not available

    Implementation of Text To Speech for Marathi Language Using Transcriptions Concept

    Get PDF
    ABSTRACT This research paper presents the approach towards converting text to speech using new methodology. The text to speech conversion system enables user to enter text in Marathi and as output it gets sound. The paper presents the steps followed for converting text to speech for Marathi language and the algorithm used for it. The focus of this paper is based on the tokenisation process and the orthographic representation of the text that shows the mapping of letter to sound using the description of language's phonetics. Here the main focus is on the text to IPA transcription concept. It is in fact, a system that translates text to IPA transcription which is the primary stage for text to speech conversion. The whole procedure for converting text to speech involves a great deal of time as it's not an easy task and requires efforts

    Text to Speech in New Languages without a Standardized Orthography

    Get PDF
    Abstract Many spoken languages do not have a standardized writing system. Building text to speech voices for them, without accurate transcripts of speech data is difficult. Our language independent method to bootstrap synthetic voices using only speech data relies upon cross-lingual phonetic decoding of speech. In this paper, we describe novel additions to our bootstrapping method. We present results on eight different languages---English, Dari, Pashto, Iraqi, Thai, Konkani, Inupiaq and Ojibwe, from different language families and show that our phonetic voices can be made understandable with as little as an hour of speech data that never had transcriptions, and without many resources in the target language available. We also present purely acoustic techniques that can help induce syllable and word level information that can further improve the intelligibility of these voices. Index Terms: speech synthesis, synthesis without text, languages without an orthography Introduction Recent developments in speech and language technologies have revolutionized the ways in which we access information. Advances in speech recognition, speech synthesis and dialog modeling have brought out interactive agents that people can talk to naturally and ask for information. There is a lot of interest in building such systems especially in multilingual environments. Building speech and language systems typically requires significant amounts of data and linguistic resources. For many spoken languages of the world, finding large corpora or linguistic resources is difficult. Yet, these languages have many native speakers around the world and it would be very interesting to deploy speech technologies in them. Our work is about building text-to-speech systems for languages that are purely spoken languages: they do not have a standardized writing system. These languages could be mainstream languages such as Konkani (a western Indian language with over 8 million speakers), or dialects of a major language that are phonetically quite distinct from the closest major language. Building a TTS system usually requires training data consisting of a speech corpus with corresponding transcripts. However, for these languages that aren't written down in a standard manner, one can only find speech corpora. Our current efforts focus on building speech synthesis systems when our training data doesn't contain text. It may seem futile to build a TTS system when the language at hand doesn't have a text form. Indeed, if there is no text at training time, there won't be text at test time, and then one might wonder why we need a TTS system at all. However, consider the use case of deploying a speech-tospeech translation of video lectures from English into Konkani. We have to synthesize speech in this "un-written" language from the output of a machine translation system. Even if the language at hand may not have a text form, we need some intermediate representation that can act as a text form that the machine translation system can produce. A first approximation of such a form is phonetic strings. Another use case for which we need TTS without text is, say, deploying a bus information system in Konkani. Our dialog system could have information about when the next bus is, but it has to generate speech to deliver this information. Again, one can imagine using a phonetic form to represent the speech to be generated, and produce a string of phones from the natural language generation model in the bus information dialog system. The work we present here is our continued effort in improving text to speech for languages that do not have a standardized orthography. We have built voices for several languages, from purely speech corpora, and produced understandable synthesis. We use cross-lingual phonetic speech recognition methods to do so. Phone strings are not ideal for TTS, however, as a lot of information is contained in higher level phonological units including the syllables and words that can help produce natural prosody. However, detecting words from speech corpus alone is a difficult task. We have explored how purely acoustic techniques can be used to detect word like units in our training speech corpus and use this to further improve the intelligibility of speech synthesis

    Características prosódicas associadas aos sinais de pontuação: uma revisão de escopo

    Get PDF
    The purpose of this article is to present the scope review of the prosodic features associated with punctuation marks. A bibliographic survey was carried out based on the search for descriptors in English and Portuguese, organized according to the following syntax: prosody AND acoustic AND speaks AND structure AND ("punctuation marks" OR "graphic punctuation" OR "punctuation mark"), not including citations and patents in the databases: OvidMedlin, Public Medicine Library (PubMed), Scopus (Elsevier), Ebscohost (Academic Search Premier), Gale Academic Online, and Google Scholar. The studies in this review confirmed our research question, highlighting the relationship between punctuation marks and prosodic aspects. Most technology-related works have developed different neural networks to transform text to speech and/or convert speech to text and have shown that pauses are pointed out as stronger indicators of punctuation marks.O objetivo deste artigo é apresentar uma revisão de escopo sobre as características prosódicas associadas aos sinais de pontuação. Foi realizado um levantamento bibliográfico a partir da pesquisa de descritores em inglês e português, organizados de acordo com a seguinte sintaxe: prosódia AND acústica AND discurso AND estrutura AND ("sinais de pontuação" OR "pontuação gráfica" OR "sinal de pontuação"), sem incluir citações e patentes nas bases de dados: OvidMedlin, Public Medicine Library (PubMed), Scopus (Elsevier), Ebscohost (Academic Search Premier), Gale Academic Online e Google Scholar. Observamos que existe uma diversidade de métodos empregados para analisar a correlação entre os sinais de pontuação e as características prosódicas. Os estudos desta revisão confirmaram nossa pergunta de pesquisa, evidenciando a relação entre os sinais de pontuação e os aspectos prosódicos. A maioria dos trabalhos relacionados à tecnologia desenvolveu diferentes redes neurais para transformar texto em fala e/ou para converter fala em texto e mostrou que as pausas são apontadas como indicadores mais fortes dos sinais de pontuação

    A study on reusing resources of speech synthesis for closely-related languages

    Get PDF
    This thesis describes research on building a text-to-speech (TTS) framework that can accommodate the lack of linguistic information of under-resource languages by using existing resources from another language. It describes the adaptation process required when such limited resource is used. The main natural languages involved in this research are Malay and Iban language. The thesis includes a study on grapheme to phoneme mapping and the substitution of phonemes. A set of substitution matrices is presented which show the phoneme confusion in term of perception among respondents. The experiments conducted study the intelligibility as well as perception based on context of utterances. The study on the phonetic prosody is then presented and compared to the Klatt duration model. This is to find the similarities of cross language duration model if one exists. Then a comparative study of Iban native speaker with an Iban polyglot TTS using Malay resources is presented. This is to confirm that the prosody of Malay can be used to generate Iban synthesised speech. The central hypothesis of this thesis is that by using a closely-related language resource, a natural sounding speech can be produced. The aim of this research was to show that by sticking to the indigenous language characteristics, it is possible to build a polyglot synthesised speech system even with insufficient speech resources

    Development of isiXhosa text-to-speech modules to support e-Services in marginalized rural areas

    Get PDF
    Information and Communication Technology (ICT) projects are being initiated and deployed in marginalized areas to help improve the standard of living for community members. This has lead to a new field, which is responsible for information processing and knowledge development in rural areas, called Information and Communication Technology for Development (ICT4D). An ICT4D projects has been implemented in a marginalized area called Dwesa; this is a rural area situated in the wild coast of the former homelandof Transkei, in the Eastern Cape Province of South Africa. In this rural community there are e-Service projects which have been developed and deployed to support the already existent ICT infrastructure. Some of these projects include the e-Commerce platform, e-Judiciary service, e-Health and e-Government portal. Although these projects are deployed in this area, community members face a language and literacy barrier because these services are typically accessed through English textual interfaces. This becomes a challenge because their language of communication is isiXhosa and some of the community members are illiterate. Most of the rural areas consist of illiterate people who cannot read and write isiXhosa but can only speak the language. This problem of illiteracy in rural areas affects both the youth and the elderly. This research seeks to design, develop and implement software modules that can be used to convert isiXhosa text into natural sounding isiXhosa speech. Such an application is called a Text-to-Speech (TTS) system. The main objective of this research is to improve ICT4D eServices’ usability through the development of an isiXhosa Text-to-Speech system. This research is undertaken within the context of Siyakhula Living Lab (SLL), an ICT4D intervention towards improving the lives of rural communities of South Africa in an attempt to bridge the digital divide. Thedeveloped TTS modules were subsequently tested to determine their applicability to improve eServices usability. The results show acceptable levels of usability as having produced audio utterances for the isiXhosa Text-To-Speech system for marginalized areas
    corecore