2,962 research outputs found

    An introduction to statistical parametric speech synthesis

    Get PDF

    Amharic Speech Recognition for Speech Translation

    No full text
    International audienceThe state-of-the-art speech translation can be seen as a cascade of Automatic Speech Recognition, Statistical Machine Translation and Text-To-Speech synthesis. In this study an attempt is made to experiment on Amharic speech recognition for Amharic-English speech translation in tourism domain. Since there is no Amharic speech corpus, we developed a read-speech corpus of 7.43hr in tourism domain. The Amharic speech corpus has been recorded after translating standard Basic Traveler Expression Corpus (BTEC) under a normal working environment. In our ASR experiments phoneme and syllable units are used for acoustic models, while morpheme and word are used for language models. Encouraging ASR results are achieved using morpheme-based language models and phoneme-based acoustic models with a recognition accuracy result of 89.1%, 80.9%, 80.6%, and 49.3% at character, morph, word and sentence level respectively. We are now working towards designing Amharic-English speech translation through cascading components under different error correction algorithms

    Marathi Speech Synthesized Using Unit selection Algorithm

    Get PDF
    In this paper, we present the concatenative text-to-speech system and discuss the issues relevant to the development of a Marathi speech synthesizer using different choice of units: Words, Di phone and Tri phone as a database. Quality of the synthesizer with different unit size indicates that the word synthesizer performs better than the phoneme synthesizer. The most important qualities of a speech synthesis system are naturalness and intelligibility. We synthesize the Marathi text and perform the subjective evaluations of the synthesized speech. As a result 85% of speech synthesized by the proposed method was preferred to that by the conventional method; the results show the effectiveness of the proposed method. In this paper we are going to focus on a Dip hone and Trip hone through which will get a 95% quality voice

    DNN-based Speech Synthesis for Indian Languages from ASCII text

    Get PDF
    Text-to-Speech synthesis in Indian languages has a seen lot of progress over the decade partly due to the annual Blizzard challenges. These systems assume the text to be written in Devanagari or Dravidian scripts which are nearly phonemic orthography scripts. However, the most common form of computer interaction among Indians is ASCII written transliterated text. Such text is generally noisy with many variations in spelling for the same word. In this paper we evaluate three approaches to synthesize speech from such noisy ASCII text: a naive Uni-Grapheme approach, a Multi-Grapheme approach, and a supervised Grapheme-to-Phoneme (G2P) approach. These methods first convert the ASCII text to a phonetic script, and then learn a Deep Neural Network to synthesize speech from that. We train and test our models on Blizzard Challenge datasets that were transliterated to ASCII using crowdsourcing. Our experiments on Hindi, Tamil and Telugu demonstrate that our models generate speech of competetive quality from ASCII text compared to the speech synthesized from the native scripts. All the accompanying transliterated datasets are released for public access.Comment: 6 pages, 5 figures -- Accepted in 9th ISCA Speech Synthesis Worksho

    A study in the design and impact of an oral/aural bridge component in second language literacy

    Get PDF
    This study examines the factors impacting how child speakers of two minority languages spoken in India, Bondo and Desiya, acquire phonological awareness of sounds in Oriya, the language of instruction in many schools in the state of Orissa. Previous research has shown that learners benefit from instruction that teaches how to analyze and synthesize sounds in their first and second language rather than repeat and memorize them. Learners also have shown better recognition of sounds when they are presented in minimal contrasts than when they are not. Previous research also recommends that learners benefit from learning only oral and aural skills in the beginning stages of acquiring additional languages. Bondo and Desiya speaking children often acquire Oriya in ways that do not follow these findings. In this research, I prepared a set of oral/aural sound discrimination lessons to supplement language programs. An examination of the data from this study shows that when these lessons were used for at least two months, most students showed gains improvement in their phonological awareness of Oriya sounds. In my research, I discovered specific factors that seemed to relate to the development of phonological awareness. These factors are the teaching approach used in the experimental lessons, especially learning to contrast sounds through minimal phonetic differences; the learner’s existing knowledge of the Oriya writing system and vocabulary that was enhanced through the experimental lessons; sufficient cognitive maturity to handle sound discrimination tasks that required analytical thinking skills; and previous educational experience. Data also revealed that the students did not show significant improvement in their production skills. Results from production tasks do show that learners have specific patterns of production that reveal developmental stages. Bondo and Desiya speakers approximate Oriya aspirated stops by adding frication, by producing shorter aspiration noise and by producing longer aspiration times before the vowel. This research indicates that students are able to improve their perceptual skills when they receive lessons that explicitly teach how to discriminate sounds in the second language that do not occur in their first language. Further research is needed to test how students can also advance in their second language production skills

    Current trends in multilingual speech processing

    Get PDF
    In this paper, we describe recent work at Idiap Research Institute in the domain of multilingual speech processing and provide some insights into emerging challenges for the research community. Multilingual speech processing has been a topic of ongoing interest to the research community for many years and the field is now receiving renewed interest owing to two strong driving forces. Firstly, technical advances in speech recognition and synthesis are posing new challenges and opportunities to researchers. For example, discriminative features are seeing wide application by the speech recognition community, but additional issues arise when using such features in a multilingual setting. Another example is the apparent convergence of speech recognition and speech synthesis technologies in the form of statistical parametric methodologies. This convergence enables the investigation of new approaches to unified modelling for automatic speech recognition and text-to-speech synthesis (TTS) as well as cross-lingual speaker adaptation for TTS. The second driving force is the impetus being provided by both government and industry for technologies to help break down domestic and international language barriers, these also being barriers to the expansion of policy and commerce. Speech-to-speech and speech-to-text translation are thus emerging as key technologies at the heart of which lies multilingual speech processin
    • …
    corecore