3,465 research outputs found
Correlating ASR Errors with Developmental Changes in Speech Production: A Study of 3-10-Year-Old European Portuguese Children's Speech
International audienceAutomatically recognising children's speech is a very difficult task. This difficulty can be attributed to the high variability in children's speech, both within and across speakers. The variability is due to developmental changes in children's anatomy, speech production skills et cetera, and manifests itself, for example, in fundamental and formant frequencies, the frequency of disfluencies, and pronunciation quality. In this paper, we report the results of acoustic and auditory analyses of 3-10-year-old European Portuguese children's speech. Furthermore, we are able to correlate some of the pronunciation error patterns revealed by our analyses - such as the truncation of consonant clusters - with the errors made by a children's speech recogniser trained on speech collected from the same age group. Other pronunciation error patterns seem to have little or no impact on speech recognition performance. In future work, we will attempt to use our findings to improve the performance of our recogniser
Automatically Recognising European Portuguese Children's Speech
International audienceThis paper reports findings from an analysis of errors made by an automatic speech recogniser trained and tested with 3-10-year-old European Portuguese children's speech. We expected and were able to identify frequent pronunciation error patterns in the children's speech. Furthermore, we were able to correlate some of these pronunciation error patterns and automatic speech recognition errors. The findings reported in this paper are of phonetic interest but will also be useful for improving the performance of automatic speech recognisers aimed at children representing the target population of the study
A process-oriented language for describing aspects of reading comprehension
Includes bibliographical references (p. 36-38)The research described herein was supported in part by the National Institute of Education under Contract No. MS-NIE-C-400-76-011
Pronunciation Portfolio : How were, are, and will be you?
No two students are the same. There are about 2 billion students of English on this planet and each student is always evolving through training. This means that there are about 2 billion types of English pronunciation. Despite the tremendous number of pronunciations, there has been no good method so far to represent each pronunciation individually. This study introduces a very novel method to represent the individual pronunciations. The method is based on physical implementation of structural phonology and the implementation can be regarded as a mathematical interpretation of Saussureâs claim that language is a system of conceptual differences and phonic differences. Each studentâs pronunciation is acoustically and entirely represented as phonological structure with no dimensions to indicate non-linguistic features like age, gender, speaker, microphone, room, line, etc. This paper examines whether the structural representation can provide a good tool for pronunciation assessment. Results of experiments with good and intentionally-bad pronunciations of a single speaker showed that all the students used in the experiment are acoustically located between the two pronunciations, indicating that the students are judged to be acoustically closer to the speaker than the speaker himself is. This result shows that the proposed method can delete the irrelevant factors effectively and is extremely reliable in CALL
Automatic transcription and phonetic labelling of dyslexic children's reading in Bahasa Melayu
Automatic speech recognition (ASR) is potentially helpful for children who suffer
from dyslexia. Highly phonetically similar errors of dyslexic childrenâs reading affect the accuracy of ASR. Thus, this study aims to evaluate acceptable accuracy of ASR using automatic transcription and phonetic labelling of dyslexic childrenâs reading in BM. For that, three objectives have been set: first to produce manual transcription and phonetic labelling; second to construct automatic transcription and phonetic labelling using forced alignment; and third to compare between accuracy using automatic transcription and phonetic labelling and manual transcription and
phonetic labelling. Therefore, to accomplish these goals methods have been used including manual speech labelling and segmentation, forced alignment, Hidden Markov Model (HMM) and Artificial Neural Network (ANN) for training, and for measure accuracy of ASR, Word Error Rate (WER) and False Alarm Rate (FAR) were used. A number of 585 speech files are used for manual transcription, forced alignment and training experiment. The recognition ASR engine using automatic transcription and phonetic labelling obtained optimum results is 76.04% with WER as low as 23.96% and FAR is 17.9%. These results are almost similar with ASR
engine using manual transcription namely 76.26%, WER as low as 23.97% and FAR a 17.9%. As conclusion, the accuracy of automatic transcription and phonetic labelling is acceptable to use it for help dyslexic children learning using ASR in Bahasa Melayu (BM
Promoting Phonological Awareness in Young Children through At-Home Activities: A Video Curriculum
Research relating phonological awareness, beginning reading acquisition, and parental involvement in children\u27s literacy development was read, evaluated, and summarized. A positive relationship between phonological awareness and learning to read was indicated from this review, and a correlation between parental literacy activities and children\u27s language and reading acquisition was found. Studies suggesting the existence of a developmental sequence of phonological skills were examined. The literature review provided a rationale and design for phonological awareness instruction. A research supported curriculum containing a teacher\u27s manual, take-home interactive video activities and activity sheets, and assessments was created
Atypical cortical tracking of the speech envelope in children who stutter: a potential contributor towards phonological processing differences
A growing body of evidence suggests that individuals with developmental stuttering exhibit phonological processing differences when compared to fluent peers. However, it has yet to be unveiled which factors may contribute towards this atypical processing. It has been argued that the speech mechanisms which process these phonological units are monitored within a hierarchical system, whose foundation is controlled by low-frequency neural oscillating networks (Giraud & Poeppel, 2015). Thus, phonological processing differences may arise due to impairments in fundamental mechanisms associated with low-frequency neural oscillating networks, such as temporal speech encoding. For this reason, this study sought to investigate cortical temporal response functions in 14 children who stutter (3-7 years of age) compared to 13 normally fluent peers. EEG data were recorded as participants encoded natural speech during a dichotic listening task. When comparing between groups, the results provide evidence that children who stutter experience significantly weaker cortical tracking for unattended speech and more efficient cortical tracking for attended speech, suggesting that phonological processing is atypical at the level of speech envelope encoding. Considering these findings, we propose that children who stutter may be increasing cognitive effort during speech and language processing, in order to compensate for an atypical phonological processing mechanism
Gender detection in childrenâs speech utterances for human-robot interaction
The human voice speech essentially includes paralinguistic information used in many real-time applications. Detecting the childrenâs gender is considered a challenging task compared to the adultâs gender. In this study, a system for human-robot interaction (HRI) is proposed to detect the gender in childrenâs speech utterances without depending on the text. The robot's perception includes three phases: Featureâs extraction phase where four formants are measured at each glottal pulse and then a median is calculated across these measurements. After that, three types of features are measured which are formant average (AF), formant dispersion (DF), and formant position (PF). Featureâs standardization phase where the measured feature dimensions are standardized using the z-score method. The semantic understanding phase is where the childrenâs gender is detected accurately using the logistic regression classifier. At the same time, the action of the robot is specified via a speech response using the text to speech (TTS) technique. Experiments are conducted on the Carnegie Mellon University (CMU) Kids dataset to measure the suggested systemâs performance. In the suggested system, the overall accuracy is 98%. The results show a relatively clear improvement in terms of accuracy of up to 13% compared to related works that utilized the CMU Kids dataset
- âŠ