18,894 research outputs found
Challenges in creating speech recognition for endangered language CALL: A Chickasaw case study
Speech recognition technology is increasingly becoming an important component of Computer Assisted Language Learning (CALL) software, as well as of a language’s digital vitality. CALL software that integrates speech recognition allows learners to practice oral skills without live instruction and receive feedback on pronunciation. This speech recognition technology may be particularly beneficial for endangered or under-resourced languages. Chickasaw is an indigenous language of North America now spoken mainly in the state of Oklahoma. It is estimated that there are fewer than 75 native speakers of the language remaining, though recent years have seen a surge of interest in Chickasaw culture and language revitalization. In 2007, the Chickasaw Nation launched a robust and multifaceted revitalization program, and in 2015 they commissioned CALL software that integrates speech recognition. However, creating a quality automatic speech recognition (ASR) system necessitates a number of resources that are not always readily available for endangered languages like Chickasaw. Modern speech recognition technology is based on large-scale statistical modeling of target language text and hand transcribed audio corpora. Such technology also assumes a single standardized phonetic orthography where speech can be directly mapped to text. Currently, most available resources for building speech recognition technology are based on languages where researchers are able to access a large pool of literate native speakers who are willing and able to record many hours of high quality audio, and where large volumes of accessible text already exist. For many endangered languages, these criteria cannot easily be fulfilled. This paper is focused on identifying the dimensions of resource challenges that affect building corpora for such languages, using Chickasaw as a case study. Furthermore, we identify techniques that we have used to create a corpus of speech data suitable for building an instructional speech recognition module for use in CALL software
Text to Speech in New Languages without a Standardized Orthography
Abstract Many spoken languages do not have a standardized writing system. Building text to speech voices for them, without accurate transcripts of speech data is difficult. Our language independent method to bootstrap synthetic voices using only speech data relies upon cross-lingual phonetic decoding of speech. In this paper, we describe novel additions to our bootstrapping method. We present results on eight different languages---English, Dari, Pashto, Iraqi, Thai, Konkani, Inupiaq and Ojibwe, from different language families and show that our phonetic voices can be made understandable with as little as an hour of speech data that never had transcriptions, and without many resources in the target language available. We also present purely acoustic techniques that can help induce syllable and word level information that can further improve the intelligibility of these voices. Index Terms: speech synthesis, synthesis without text, languages without an orthography Introduction Recent developments in speech and language technologies have revolutionized the ways in which we access information. Advances in speech recognition, speech synthesis and dialog modeling have brought out interactive agents that people can talk to naturally and ask for information. There is a lot of interest in building such systems especially in multilingual environments. Building speech and language systems typically requires significant amounts of data and linguistic resources. For many spoken languages of the world, finding large corpora or linguistic resources is difficult. Yet, these languages have many native speakers around the world and it would be very interesting to deploy speech technologies in them. Our work is about building text-to-speech systems for languages that are purely spoken languages: they do not have a standardized writing system. These languages could be mainstream languages such as Konkani (a western Indian language with over 8 million speakers), or dialects of a major language that are phonetically quite distinct from the closest major language. Building a TTS system usually requires training data consisting of a speech corpus with corresponding transcripts. However, for these languages that aren't written down in a standard manner, one can only find speech corpora. Our current efforts focus on building speech synthesis systems when our training data doesn't contain text. It may seem futile to build a TTS system when the language at hand doesn't have a text form. Indeed, if there is no text at training time, there won't be text at test time, and then one might wonder why we need a TTS system at all. However, consider the use case of deploying a speech-tospeech translation of video lectures from English into Konkani. We have to synthesize speech in this "un-written" language from the output of a machine translation system. Even if the language at hand may not have a text form, we need some intermediate representation that can act as a text form that the machine translation system can produce. A first approximation of such a form is phonetic strings. Another use case for which we need TTS without text is, say, deploying a bus information system in Konkani. Our dialog system could have information about when the next bus is, but it has to generate speech to deliver this information. Again, one can imagine using a phonetic form to represent the speech to be generated, and produce a string of phones from the natural language generation model in the bus information dialog system. The work we present here is our continued effort in improving text to speech for languages that do not have a standardized orthography. We have built voices for several languages, from purely speech corpora, and produced understandable synthesis. We use cross-lingual phonetic speech recognition methods to do so. Phone strings are not ideal for TTS, however, as a lot of information is contained in higher level phonological units including the syllables and words that can help produce natural prosody. However, detecting words from speech corpus alone is a difficult task. We have explored how purely acoustic techniques can be used to detect word like units in our training speech corpus and use this to further improve the intelligibility of speech synthesis
Strategies for Representing Tone in African Writing Systems
Tone languages provide some interesting challenges for the designers of new orthographies.
One approach is to omit tone marks, just as stress is not marked in English (zero marking).
Another approach is to do phonemic tone analysis and then make heavy use of diacritic
symbols to distinguish the `tonemes' (exhaustive marking). While orthographies based on
either system have been successful, this may be thanks to our ability to manage inadequate
orthographies rather than to any intrinsic advantage which is afforded by one or the other
approach. In many cases, practical experience with both kinds of orthography in sub-Saharan
Africa has shown that people have not been able to attain the level of reading and writing
fluency that we know to be possible for the orthographies of non-tonal languages. In some
cases this can be attributed to a sociolinguistic setting which does not favour vernacular
literacy. In other cases, the orthography itself might be to blame. If the orthography of a tone
language is difficult to user or to learn, then a good part of the reason, I believe, is that the
designer either has not paid enough attention to the function of tone in the language, or has
not ensured that the information encoded in the orthography is accessible to the ordinary
(non-linguist) user of the language. If the writing of tone is not going to continue to be a
stumbling block to literacy efforts, then a fresh approach to tone orthography is required, one
which assigns high priority to these two factors.
This article describes the problems with orthographies that use too few or too many tone
marks, and critically evaluates a wide range of creative intermediate solutions. I review the
contributions made by phonology and reading theory, and provide some broad methodological
principles to guide someone who is seeking to represent tone in a writing system. The tone
orthographies of several languages from sub-Saharan Africa are presented throughout the
article, with particular emphasis on some tone languages of Cameroon
When marking tone reduces fluency: an orthography experiment in Cameroon
Should an alphabetic orthography for a tone language include tone marks? Opinion and
practice are divided along three lines: zero marking, phonemic marking and various reduced
marking schemes. This paper examines the success of phonemic tone marking for Dschang, a
Grassfields Bantu language which uses tone to distinguish lexical items and some grammatical
constructions. Participants with a variety of ages and educational backgrounds, and having
different levels of exposure to the orthography were tested on location in the Western
Province of Cameroon. All but one had attended classes on tone marking. Participants read
texts which were marked and unmarked for tone, then added tone marks to the unmarked
texts. Analysis shows that tone marking degrades reading fluency and does not help to resolve
tonally ambiguous words. Experienced writers attain an accuracy score of 83.5% in adding
tone marks to a text, while inexperienced writers score a mere 53%, which is not much better
than chance. The experiment raises serious doubts about the suitability of the phonemic
method of marking tone for languages having widespread tone sandhi effects, and lends
support to the notion that a writing system should have `fixed word images'. A critical review
of other experimental work on African tone orthography lays the groundwork for the
experiment, and contributes to the establishment of a uniform experimental paradigm
Linguistic unit discovery from multi-modal inputs in unwritten languages: Summary of the "Speaking Rosetta" JSALT 2017 Workshop
We summarize the accomplishments of a multi-disciplinary workshop exploring
the computational and scientific issues surrounding the discovery of linguistic
units (subwords and words) in a language without orthography. We study the
replacement of orthographic transcriptions by images and/or translated text in
a well-resourced language to help unsupervised discovery from raw speech.Comment: Accepted to ICASSP 201
Data Cleaning for XML Electronic Dictionaries via Statistical Anomaly Detection
Many important forms of data are stored digitally in XML format. Errors can
occur in the textual content of the data in the fields of the XML. Fixing these
errors manually is time-consuming and expensive, especially for large amounts
of data. There is increasing interest in the research, development, and use of
automated techniques for assisting with data cleaning. Electronic dictionaries
are an important form of data frequently stored in XML format that frequently
have errors introduced through a mixture of manual typographical entry errors
and optical character recognition errors. In this paper we describe methods for
flagging statistical anomalies as likely errors in electronic dictionaries
stored in XML format. We describe six systems based on different sources of
information. The systems detect errors using various signals in the data
including uncommon characters, text length, character-based language models,
word-based language models, tied-field length ratios, and tied-field
transliteration models. Four of the systems detect errors based on expectations
automatically inferred from content within elements of a single field type. We
call these single-field systems. Two of the systems detect errors based on
correspondence expectations automatically inferred from content within elements
of multiple related field types. We call these tied-field systems. For each
system, we provide an intuitive analysis of the type of error that it is
successful at detecting. Finally, we describe two larger-scale evaluations
using crowdsourcing with Amazon's Mechanical Turk platform and using the
annotations of a domain expert. The evaluations consistently show that the
systems are useful for improving the efficiency with which errors in XML
electronic dictionaries can be detected.Comment: 8 pages, 4 figures, 5 tables; published in Proceedings of the 2016
IEEE Tenth International Conference on Semantic Computing (ICSC), Laguna
Hills, CA, USA, pages 79-86, February 201
- …