18,894 research outputs found

    Challenges in creating speech recognition for endangered language CALL: A Chickasaw case study

    Get PDF
    Speech recognition technology is increasingly becoming an important component of Computer Assisted Language Learning (CALL) software, as well as of a language’s digital vitality. CALL software that integrates speech recognition allows learners to practice oral skills without live instruction and receive feedback on pronunciation. This speech recognition technology may be particularly beneficial for endangered or under-resourced languages. Chickasaw is an indigenous language of North America now spoken mainly in the state of Oklahoma. It is estimated that there are fewer than 75 native speakers of the language remaining, though recent years have seen a surge of interest in Chickasaw culture and language revitalization. In 2007, the Chickasaw Nation launched a robust and multifaceted revitalization program, and in 2015 they commissioned CALL software that integrates speech recognition. However, creating a quality automatic speech recognition (ASR) system necessitates a number of resources that are not always readily available for endangered languages like Chickasaw. Modern speech recognition technology is based on large-scale statistical modeling of target language text and hand transcribed audio corpora. Such technology also assumes a single standardized phonetic orthography where speech can be directly mapped to text. Currently, most available resources for building speech recognition technology are based on languages where researchers are able to access a large pool of literate native speakers who are willing and able to record many hours of high quality audio, and where large volumes of accessible text already exist. For many endangered languages, these criteria cannot easily be fulfilled. This paper is focused on identifying the dimensions of resource challenges that affect building corpora for such languages, using Chickasaw as a case study. Furthermore, we identify techniques that we have used to create a corpus of speech data suitable for building an instructional speech recognition module for use in CALL software

    Text to Speech in New Languages without a Standardized Orthography

    Get PDF
    Abstract Many spoken languages do not have a standardized writing system. Building text to speech voices for them, without accurate transcripts of speech data is difficult. Our language independent method to bootstrap synthetic voices using only speech data relies upon cross-lingual phonetic decoding of speech. In this paper, we describe novel additions to our bootstrapping method. We present results on eight different languages---English, Dari, Pashto, Iraqi, Thai, Konkani, Inupiaq and Ojibwe, from different language families and show that our phonetic voices can be made understandable with as little as an hour of speech data that never had transcriptions, and without many resources in the target language available. We also present purely acoustic techniques that can help induce syllable and word level information that can further improve the intelligibility of these voices. Index Terms: speech synthesis, synthesis without text, languages without an orthography Introduction Recent developments in speech and language technologies have revolutionized the ways in which we access information. Advances in speech recognition, speech synthesis and dialog modeling have brought out interactive agents that people can talk to naturally and ask for information. There is a lot of interest in building such systems especially in multilingual environments. Building speech and language systems typically requires significant amounts of data and linguistic resources. For many spoken languages of the world, finding large corpora or linguistic resources is difficult. Yet, these languages have many native speakers around the world and it would be very interesting to deploy speech technologies in them. Our work is about building text-to-speech systems for languages that are purely spoken languages: they do not have a standardized writing system. These languages could be mainstream languages such as Konkani (a western Indian language with over 8 million speakers), or dialects of a major language that are phonetically quite distinct from the closest major language. Building a TTS system usually requires training data consisting of a speech corpus with corresponding transcripts. However, for these languages that aren't written down in a standard manner, one can only find speech corpora. Our current efforts focus on building speech synthesis systems when our training data doesn't contain text. It may seem futile to build a TTS system when the language at hand doesn't have a text form. Indeed, if there is no text at training time, there won't be text at test time, and then one might wonder why we need a TTS system at all. However, consider the use case of deploying a speech-tospeech translation of video lectures from English into Konkani. We have to synthesize speech in this "un-written" language from the output of a machine translation system. Even if the language at hand may not have a text form, we need some intermediate representation that can act as a text form that the machine translation system can produce. A first approximation of such a form is phonetic strings. Another use case for which we need TTS without text is, say, deploying a bus information system in Konkani. Our dialog system could have information about when the next bus is, but it has to generate speech to deliver this information. Again, one can imagine using a phonetic form to represent the speech to be generated, and produce a string of phones from the natural language generation model in the bus information dialog system. The work we present here is our continued effort in improving text to speech for languages that do not have a standardized orthography. We have built voices for several languages, from purely speech corpora, and produced understandable synthesis. We use cross-lingual phonetic speech recognition methods to do so. Phone strings are not ideal for TTS, however, as a lot of information is contained in higher level phonological units including the syllables and words that can help produce natural prosody. However, detecting words from speech corpus alone is a difficult task. We have explored how purely acoustic techniques can be used to detect word like units in our training speech corpus and use this to further improve the intelligibility of speech synthesis

    Strategies for Representing Tone in African Writing Systems

    Get PDF
    Tone languages provide some interesting challenges for the designers of new orthographies. One approach is to omit tone marks, just as stress is not marked in English (zero marking). Another approach is to do phonemic tone analysis and then make heavy use of diacritic symbols to distinguish the `tonemes' (exhaustive marking). While orthographies based on either system have been successful, this may be thanks to our ability to manage inadequate orthographies rather than to any intrinsic advantage which is afforded by one or the other approach. In many cases, practical experience with both kinds of orthography in sub-Saharan Africa has shown that people have not been able to attain the level of reading and writing fluency that we know to be possible for the orthographies of non-tonal languages. In some cases this can be attributed to a sociolinguistic setting which does not favour vernacular literacy. In other cases, the orthography itself might be to blame. If the orthography of a tone language is difficult to user or to learn, then a good part of the reason, I believe, is that the designer either has not paid enough attention to the function of tone in the language, or has not ensured that the information encoded in the orthography is accessible to the ordinary (non-linguist) user of the language. If the writing of tone is not going to continue to be a stumbling block to literacy efforts, then a fresh approach to tone orthography is required, one which assigns high priority to these two factors. This article describes the problems with orthographies that use too few or too many tone marks, and critically evaluates a wide range of creative intermediate solutions. I review the contributions made by phonology and reading theory, and provide some broad methodological principles to guide someone who is seeking to represent tone in a writing system. The tone orthographies of several languages from sub-Saharan Africa are presented throughout the article, with particular emphasis on some tone languages of Cameroon

    Orthography development

    Get PDF

    When marking tone reduces fluency: an orthography experiment in Cameroon

    Get PDF
    Should an alphabetic orthography for a tone language include tone marks? Opinion and practice are divided along three lines: zero marking, phonemic marking and various reduced marking schemes. This paper examines the success of phonemic tone marking for Dschang, a Grassfields Bantu language which uses tone to distinguish lexical items and some grammatical constructions. Participants with a variety of ages and educational backgrounds, and having different levels of exposure to the orthography were tested on location in the Western Province of Cameroon. All but one had attended classes on tone marking. Participants read texts which were marked and unmarked for tone, then added tone marks to the unmarked texts. Analysis shows that tone marking degrades reading fluency and does not help to resolve tonally ambiguous words. Experienced writers attain an accuracy score of 83.5% in adding tone marks to a text, while inexperienced writers score a mere 53%, which is not much better than chance. The experiment raises serious doubts about the suitability of the phonemic method of marking tone for languages having widespread tone sandhi effects, and lends support to the notion that a writing system should have `fixed word images'. A critical review of other experimental work on African tone orthography lays the groundwork for the experiment, and contributes to the establishment of a uniform experimental paradigm

    Linguistic unit discovery from multi-modal inputs in unwritten languages: Summary of the "Speaking Rosetta" JSALT 2017 Workshop

    Get PDF
    We summarize the accomplishments of a multi-disciplinary workshop exploring the computational and scientific issues surrounding the discovery of linguistic units (subwords and words) in a language without orthography. We study the replacement of orthographic transcriptions by images and/or translated text in a well-resourced language to help unsupervised discovery from raw speech.Comment: Accepted to ICASSP 201

    Data Cleaning for XML Electronic Dictionaries via Statistical Anomaly Detection

    Get PDF
    Many important forms of data are stored digitally in XML format. Errors can occur in the textual content of the data in the fields of the XML. Fixing these errors manually is time-consuming and expensive, especially for large amounts of data. There is increasing interest in the research, development, and use of automated techniques for assisting with data cleaning. Electronic dictionaries are an important form of data frequently stored in XML format that frequently have errors introduced through a mixture of manual typographical entry errors and optical character recognition errors. In this paper we describe methods for flagging statistical anomalies as likely errors in electronic dictionaries stored in XML format. We describe six systems based on different sources of information. The systems detect errors using various signals in the data including uncommon characters, text length, character-based language models, word-based language models, tied-field length ratios, and tied-field transliteration models. Four of the systems detect errors based on expectations automatically inferred from content within elements of a single field type. We call these single-field systems. Two of the systems detect errors based on correspondence expectations automatically inferred from content within elements of multiple related field types. We call these tied-field systems. For each system, we provide an intuitive analysis of the type of error that it is successful at detecting. Finally, we describe two larger-scale evaluations using crowdsourcing with Amazon's Mechanical Turk platform and using the annotations of a domain expert. The evaluations consistently show that the systems are useful for improving the efficiency with which errors in XML electronic dictionaries can be detected.Comment: 8 pages, 4 figures, 5 tables; published in Proceedings of the 2016 IEEE Tenth International Conference on Semantic Computing (ICSC), Laguna Hills, CA, USA, pages 79-86, February 201
    corecore