134,217 research outputs found

    Minimizing word error rate in a dyslexic reading-oriented ASR engine using phoneme refinement and alternative pronunciation

    Get PDF
    Little attention has been given to detecting miscues in the text space read by dyslexic children over an automatic speech recognition (ASR) engine. In an ASR system, the miscues are represented by word error rate (WER) and miscue detection rate (MDR). At all time, WER must be kept low, and MDR high so as to achieve better recognition. This paper focus on minimizing word error rate by formulating a better model for perspicuous representation of input data. Such representation takes into account phoneme refinement and alternative pronunciation for a particular Bahasa Melayu (BM) speech data uttered by dyslexic children. Based on literature, a few other optimal models of input data and their recognition results were compared. It is found that phoneme refinement and alternative pronunciation produced better recognition results as evidenced in the performance metrics --lower WER and higher MDR-- which are 25% and 80.77% respectively

    Automatically Recognising European Portuguese Children's Speech

    Get PDF
    International audienceThis paper reports findings from an analysis of errors made by an automatic speech recogniser trained and tested with 3-10-year-old European Portuguese children's speech. We expected and were able to identify frequent pronunciation error patterns in the children's speech. Furthermore, we were able to correlate some of these pronunciation error patterns and automatic speech recognition errors. The findings reported in this paper are of phonetic interest but will also be useful for improving the performance of automatic speech recognisers aimed at children representing the target population of the study

    Automatic Speech Recognition for Speech Assessment of Persian Preschool Children

    Full text link
    Preschool evaluation is crucial because it gives teachers and parents influential knowledge about children's growth and development. The COVID-19 pandemic has highlighted the necessity of online assessment for preschool children. One of the areas that should be tested is their ability to speak. Employing an Automatic Speech Recognition(ASR) system is useless since they are pre-trained on voices that are different from children's voices in terms of frequency and amplitude. We constructed an ASR for our cognitive test system to solve this issue using the Wav2Vec 2.0 model with a new pre-training objective called Random Frequency Pitch(RFP). In addition, we used our new dataset to fine-tune our model for Meaningless Words(MW) and Rapid Automatic Naming(RAN) tests. Our new approach reaches a Word Error Rate(WER) of 6.45 on the Persian section of the CommonVoice dataset. Furthermore, our novel methodology produces positive outcomes in zero- and few-shot scenarios.Comment: 8 pages, 5 figures, 4 tables, 1 algorith

    Automatic transcription and phonetic labelling of dyslexic children's reading in Bahasa Melayu

    Get PDF
    Automatic speech recognition (ASR) is potentially helpful for children who suffer from dyslexia. Highly phonetically similar errors of dyslexic children‟s reading affect the accuracy of ASR. Thus, this study aims to evaluate acceptable accuracy of ASR using automatic transcription and phonetic labelling of dyslexic children‟s reading in BM. For that, three objectives have been set: first to produce manual transcription and phonetic labelling; second to construct automatic transcription and phonetic labelling using forced alignment; and third to compare between accuracy using automatic transcription and phonetic labelling and manual transcription and phonetic labelling. Therefore, to accomplish these goals methods have been used including manual speech labelling and segmentation, forced alignment, Hidden Markov Model (HMM) and Artificial Neural Network (ANN) for training, and for measure accuracy of ASR, Word Error Rate (WER) and False Alarm Rate (FAR) were used. A number of 585 speech files are used for manual transcription, forced alignment and training experiment. The recognition ASR engine using automatic transcription and phonetic labelling obtained optimum results is 76.04% with WER as low as 23.96% and FAR is 17.9%. These results are almost similar with ASR engine using manual transcription namely 76.26%, WER as low as 23.97% and FAR a 17.9%. As conclusion, the accuracy of automatic transcription and phonetic labelling is acceptable to use it for help dyslexic children learning using ASR in Bahasa Melayu (BM

    Acoustic analysis of Polish nasals produced by deaf children

    Get PDF
    The development of computer-based diagnostic and teaching aids used in the revalidation of hearing-impaired children's speech and the construction of automatic speech recognition systems intended for deaf people require exhaustive knowledge in the area of deaf speech acoustic parameters. In the current research, Polish nasals produced by profoundly-deaf children were analyzed acoustically, and the parameter values obtained were then analyzed statistically. The results of the discriminant analysis show that nasals produced by deaf people involve similar identification difficulty as their realizations by normally hearing speakers. The development of computer-based diagnostic and teaching aids used in the revalidation of hearing-impaired children's speech and the construction of automatic speech recognition systems intended for deaf people require exhaustive knowledge in the area of deaf speech acoustic parameters. In the current research, Polish nasals produced by profoundly-deaf children were analyzed acoustically, and the parameter values obtained were then analyzed statistically. The results of the discriminant analysis show that nasals produced by deaf people involve similar identification difficulty as their realizations by normally hearing speakers.

    Towards age-independent acoustic modeling

    Full text link
    International audienceIn automatic speech recognition applications, due to significant differences in voice characteristics, adults and children are usually treated as two population groups, for which different acoustic models are trained. In this paper, age-independent acoustic modeling is investigated in the context of large vocabulary speech recognition. Exploiting a small amount (9 hours) of children's speech and a more significant amount (57 hours) of adult speech, age-independent acoustic models are trained using several methods for speaker adaptive acoustic modeling. Recognition results achieved using these models are compared with those achieved using age-dependent acoustic models for children and adults, respectively. Recognition experiments are performed on four Italian speech corpora, two consisting of children's speech and two of adult speech, using 64k word and 11k word trigram language models. Methods for speaker adaptive acoustic modeling prove to be effective for training age-independent acoustic models ensuring recognition results at least as good as those achieved with age-dependent acoustic models for adults and children

    Automatic Speech Recognition Model for Dyslexic Children Reading in Bahasa Melayu

    Get PDF
    Dyslexic children suffer from dyslexia, a condition that profoundly impedes reading and spelling ability due to its phonological origin. Often, these children found reading and spelling difficult, exhaustive, and less interesting, and thus they are self-withdrawn from the learning process. When reading and spelling, they make many mistakes even for simple, common words that they themselves found embarrassing. However, this does not mean that they have lower IQ level than normal children. In fact, dyslexic children have average or high level of IQ and thus have a lot of potential when given the right help and support such as motivational support and suitable teaching techniques. With advancement in technology in education, computer-based applications are used to stimulate the learning process of reading and spelling. Hence, this study is an initiative towards proposing an automatic speech recognition (ASR) model to enable computer to 'listen' should incorrect reading occurs. The scope of this study focuses on modeling and recognizing single Bahasa Melayu (BM) words within the school syllabus for level one (tahap satu) dyslexic pupils of primary schools. To propose the ASR model, a reading and spelling model of dyslexic children reading in BM is first proposed, which models reading at word recognition level. To propose such model, ethnographic techniques are employed namely informal interviews and observation, in order to obtain the reading and spelling error patterns of dyslexic children. A number of ten dyslexic children, aged between 7 to 14 years old whose reading level is similar, participated in the study. These children are recruited from two public schools that offer special dyslexia classes for the children. A total of 6112 utterances are recorded in audio form resulting in a total of 6051 errors of various types. Among these, the patterns that are most frequently made by these children are of 'Substitutes vowel', 'omits consonant', 'nasals', and 'substitutes consonant'. The ASR model is proposed taking into consideration the error patterns that make lexical model a fundamental element for speech recognition. The lexical model is modeled to treat mispronunciations as alternative pronunciations or variants of target words. To that, a phoneme refinement strategy is applied aiming to increase recognition accuracy. A prototype recognizer is developed based on the proposed model for further evaluation. The evaluation is performed to evaluate the recognizer's performance in terms of accuracy, measured in word error rate (WER) and miscue detection rate (MDR) that is closely related to false alarm rate (FAR). The recognizer scores a satisfying 25% of WER and a relatively high MDR of 80.77% with 16.67% FAR

    User Experiences from L2 Children Using a Speech Learning Application : Implications for Developing Speech Training Applications for Children

    Get PDF
    We investigated user experiences from 117 Finnish children aged between 8 and 12 years in a trial of an English language learning programme that used automatic speech recognition (ASR). We used measures that encompassed both affective reactions and questions tapping into the children' sense of pedagogical utility. We also tested their perception of sound quality and compared reactions of game and nongame-based versions of the application. Results showed that children expressed higher affective ratings for the game compared to nongame version of the application. Children also expressed a preference to play with a friend compared to playing alone or playing within a group. They found that assessment of their speech is useful although they did not necessarily enjoy hearing their own voices. The results are discussed in terms of the implications for user interface (UI) design in speech learning applications for children.Peer reviewe
    corecore