Challenges in creating speech recognition for endangered language CALL: A Chickasaw case study

Abstract

Speech recognition technology is increasingly becoming an important component of Computer Assisted Language Learning (CALL) software, as well as of a language’s digital vitality. CALL software that integrates speech recognition allows learners to practice oral skills without live instruction and receive feedback on pronunciation. This speech recognition technology may be particularly beneficial for endangered or under-resourced languages. Chickasaw is an indigenous language of North America now spoken mainly in the state of Oklahoma. It is estimated that there are fewer than 75 native speakers of the language remaining, though recent years have seen a surge of interest in Chickasaw culture and language revitalization. In 2007, the Chickasaw Nation launched a robust and multifaceted revitalization program, and in 2015 they commissioned CALL software that integrates speech recognition. However, creating a quality automatic speech recognition (ASR) system necessitates a number of resources that are not always readily available for endangered languages like Chickasaw. Modern speech recognition technology is based on large-scale statistical modeling of target language text and hand transcribed audio corpora. Such technology also assumes a single standardized phonetic orthography where speech can be directly mapped to text. Currently, most available resources for building speech recognition technology are based on languages where researchers are able to access a large pool of literate native speakers who are willing and able to record many hours of high quality audio, and where large volumes of accessible text already exist. For many endangered languages, these criteria cannot easily be fulfilled. This paper is focused on identifying the dimensions of resource challenges that affect building corpora for such languages, using Chickasaw as a case study. Furthermore, we identify techniques that we have used to create a corpus of speech data suitable for building an instructional speech recognition module for use in CALL software

    Similar works