76 research outputs found
Dyslexic children's reading pattern as input for ASR: Data, analysis, and pronunciation model
To realize an automatic speech recognition (ASR) model that
is able to recognize the Bahasa Melayu reading difficulties of dyslexic children, the language corpora has to be generated beforehand. For this purpose, data collection is performed in two public schools involving ten dyslexic children aged between seven to fourteen years old. A total of 114 Bahasa Melayu words,representing 23 consonant-vowel patterns in the spelling system of the language, served as the stimuli. The patterns range from simple to somewhat complex formations of consonant-vowel pairs in words listed in a level one primary school syllabus. An analysis was performed aimed at identifying the most frequent errors made by these dyslexic children when reading aloud, and
describing the emerging reading pattern of dyslexic children
in general. This paper hence provides an overview of the
entire process from data collection to analysis to modeling the pronunciations of words which will serve as the active lexicon for the ASR model. This paper also highlights the challenges of data collection involving dyslexic children when they are reading aloud, and other factors that contribute to the complex nature of the data collected
The effect of automatic speech recognition EyeSpeak software on Iraqi students’ English pronunciation: a pilot study
The use of technology, such as computer-assisted language learning (CALL), is used in teaching and learning in the foreign language classrooms where it is most needed.One promising emerging technology that supports language learning is automatic speech recognition (ASR).Integrating such technology, especially in the instruction of pronunciation in the classroom, is important in helping students to achieve correct pronunciation. In Iraq, English is a foreign language, and it is not surprising that learners commit many pronunciation mistakes.One factor contributing to these mistakes is the difference between the Arabic and English phonetic systems.Thus, the sound transformation from the mother tongue (Arabic) to the target language (English) is one barrier for Arab learners.The purpose of this study is to investigate the effectiveness of using automatic speech recognition ASR EyeSpeak software in improving the pronunciation of Iraqi learners of English. An experimental research project with a pretest-posttest design is conducted over a one-month period in the Department of English at Al-Turath University College in Baghdad, Iraq.The ten participants are randomly selected first-year college students enrolled in a pronunciation class that uses traditional teaching methods and ASR EyeSpeak software.The findings show that using EyeSpeak software leads to a significant improvement in the students’ English pronunciation, evident from the test scores they achieve after using EyeSpeak software
Recommended from our members
Automatic Dialect and Accent Recognition and its Application to Speech Recognition
A fundamental challenge for current research on speech science and technology is understanding and modeling individual variation in spoken language. Individuals have their own speaking styles, depending on many factors, such as their dialect and accent as well as their socioeconomic background. These individual differences typically introduce modeling difficulties for large-scale speaker-independent systems designed to process input from any variant of a given language. This dissertation focuses on automatically identifying the dialect or accent of a speaker given a sample of their speech, and demonstrates how such a technology can be employed to improve Automatic Speech Recognition (ASR). In this thesis, we describe a variety of approaches that make use of multiple streams of information in the acoustic signal to build a system that recognizes the regional dialect and accent of a speaker. In particular, we examine frame-based acoustic, phonetic, and phonotactic features, as well as high-level prosodic features, comparing generative and discriminative modeling techniques. We first analyze the effectiveness of approaches to language identification that have been successfully employed by that community, applying them here to dialect identification. We next show how we can improve upon these techniques. Finally, we introduce several novel modeling approaches -- Discriminative Phonotactics and kernel-based methods. We test our best performing approach on four broad Arabic dialects, ten Arabic sub-dialects, American English vs. Indian English accents, American English Southern vs. Non-Southern, American dialects at the state level plus Canada, and three Portuguese dialects. Our experiments demonstrate that our novel approach, which relies on the hypothesis that certain phones are realized differently across dialects, achieves new state-of-the-art performance on most dialect recognition tasks. This approach achieves an Equal Error Rate (EER) of 4% for four broad Arabic dialects, an EER of 6.3% for American vs. Indian English accents, 14.6% for American English Southern vs. Non-Southern dialects, and 7.9% for three Portuguese dialects. Our framework can also be used to automatically extract linguistic knowledge, specifically the context-dependent phonetic cues that may distinguish one dialect form another. We illustrate the efficacy of our approach by demonstrating the correlation of our results with geographical proximity of the various dialects. As a final measure of the utility of our studies, we also show that, it is possible to improve ASR. Employing our dialect identification system prior to ASR to identify the Levantine Arabic dialect in mixed speech of a variety of dialects allows us to optimize the engine's language model and use Levantine-specific acoustic models where appropriate. This procedure improves the Word Error Rate (WER) for Levantine by 4.6% absolute; 9.3% relative. In addition, we demonstrate in this thesis that, using a linguistically-motivated pronunciation modeling approach, we can improve the WER of a state-of-the art ASR system by 2.2% absolute and 11.5% relative WER on Modern Standard Arabic
Analysis of Dialectal Influence in Pan-Arabic ASR
Abstract In this paper, we analyze the impact of five Arabic dialects on the front-end and pronunciation dictionary component of an Automatic Speech Recognition (ASR) system. We use ASR"s phonetic decision tree as a diagnostic tool to compare the robustness of MFCC to MLP front-ends to dialectal variations in the speech data and found that MLP Bottle-Neck features are less robust to dialectal variation. We also perform a rulebased analysis of the pronunciation dictionary, which enables us to identify dialectal words in the vocabulary and automatically generate pronunciations for unseen words. We show that our technique produces pronunciations with an average phone error rate 9.2%
Acoustic Modelling for Under-Resourced Languages
Automatic speech recognition systems have so far been developed only for very few languages out of the 4,000-7,000 existing ones.
In this thesis we examine methods to rapidly create acoustic models in new, possibly under-resourced languages, in a time and cost effective manner. For this we examine the use of multilingual models, the application of articulatory features across languages, and the automatic discovery of word-like units in unwritten languages
- …