3,465 research outputs found

    Correlating ASR Errors with Developmental Changes in Speech Production: A Study of 3-10-Year-Old European Portuguese Children's Speech

    Get PDF
    International audienceAutomatically recognising children's speech is a very difficult task. This difficulty can be attributed to the high variability in children's speech, both within and across speakers. The variability is due to developmental changes in children's anatomy, speech production skills et cetera, and manifests itself, for example, in fundamental and formant frequencies, the frequency of disfluencies, and pronunciation quality. In this paper, we report the results of acoustic and auditory analyses of 3-10-year-old European Portuguese children's speech. Furthermore, we are able to correlate some of the pronunciation error patterns revealed by our analyses - such as the truncation of consonant clusters - with the errors made by a children's speech recogniser trained on speech collected from the same age group. Other pronunciation error patterns seem to have little or no impact on speech recognition performance. In future work, we will attempt to use our findings to improve the performance of our recogniser

    Automatically Recognising European Portuguese Children's Speech

    Get PDF
    International audienceThis paper reports findings from an analysis of errors made by an automatic speech recogniser trained and tested with 3-10-year-old European Portuguese children's speech. We expected and were able to identify frequent pronunciation error patterns in the children's speech. Furthermore, we were able to correlate some of these pronunciation error patterns and automatic speech recognition errors. The findings reported in this paper are of phonetic interest but will also be useful for improving the performance of automatic speech recognisers aimed at children representing the target population of the study

    A process-oriented language for describing aspects of reading comprehension

    Get PDF
    Includes bibliographical references (p. 36-38)The research described herein was supported in part by the National Institute of Education under Contract No. MS-NIE-C-400-76-011

    Pronunciation Portfolio : How were, are, and will be you?

    Get PDF
    No two students are the same. There are about 2 billion students of English on this planet and each student is always evolving through training. This means that there are about 2 billion types of English pronunciation. Despite the tremendous number of pronunciations, there has been no good method so far to represent each pronunciation individually. This study introduces a very novel method to represent the individual pronunciations. The method is based on physical implementation of structural phonology and the implementation can be regarded as a mathematical interpretation of Saussure’s claim that language is a system of conceptual differences and phonic differences. Each student’s pronunciation is acoustically and entirely represented as phonological structure with no dimensions to indicate non-linguistic features like age, gender, speaker, microphone, room, line, etc. This paper examines whether the structural representation can provide a good tool for pronunciation assessment. Results of experiments with good and intentionally-bad pronunciations of a single speaker showed that all the students used in the experiment are acoustically located between the two pronunciations, indicating that the students are judged to be acoustically closer to the speaker than the speaker himself is. This result shows that the proposed method can delete the irrelevant factors effectively and is extremely reliable in CALL

    Automatic transcription and phonetic labelling of dyslexic children's reading in Bahasa Melayu

    Get PDF
    Automatic speech recognition (ASR) is potentially helpful for children who suffer from dyslexia. Highly phonetically similar errors of dyslexic children‟s reading affect the accuracy of ASR. Thus, this study aims to evaluate acceptable accuracy of ASR using automatic transcription and phonetic labelling of dyslexic children‟s reading in BM. For that, three objectives have been set: first to produce manual transcription and phonetic labelling; second to construct automatic transcription and phonetic labelling using forced alignment; and third to compare between accuracy using automatic transcription and phonetic labelling and manual transcription and phonetic labelling. Therefore, to accomplish these goals methods have been used including manual speech labelling and segmentation, forced alignment, Hidden Markov Model (HMM) and Artificial Neural Network (ANN) for training, and for measure accuracy of ASR, Word Error Rate (WER) and False Alarm Rate (FAR) were used. A number of 585 speech files are used for manual transcription, forced alignment and training experiment. The recognition ASR engine using automatic transcription and phonetic labelling obtained optimum results is 76.04% with WER as low as 23.96% and FAR is 17.9%. These results are almost similar with ASR engine using manual transcription namely 76.26%, WER as low as 23.97% and FAR a 17.9%. As conclusion, the accuracy of automatic transcription and phonetic labelling is acceptable to use it for help dyslexic children learning using ASR in Bahasa Melayu (BM

    Promoting Phonological Awareness in Young Children through At-Home Activities: A Video Curriculum

    Get PDF
    Research relating phonological awareness, beginning reading acquisition, and parental involvement in children\u27s literacy development was read, evaluated, and summarized. A positive relationship between phonological awareness and learning to read was indicated from this review, and a correlation between parental literacy activities and children\u27s language and reading acquisition was found. Studies suggesting the existence of a developmental sequence of phonological skills were examined. The literature review provided a rationale and design for phonological awareness instruction. A research supported curriculum containing a teacher\u27s manual, take-home interactive video activities and activity sheets, and assessments was created

    Atypical cortical tracking of the speech envelope in children who stutter: a potential contributor towards phonological processing differences

    Get PDF
    A growing body of evidence suggests that individuals with developmental stuttering exhibit phonological processing differences when compared to fluent peers. However, it has yet to be unveiled which factors may contribute towards this atypical processing. It has been argued that the speech mechanisms which process these phonological units are monitored within a hierarchical system, whose foundation is controlled by low-frequency neural oscillating networks (Giraud & Poeppel, 2015). Thus, phonological processing differences may arise due to impairments in fundamental mechanisms associated with low-frequency neural oscillating networks, such as temporal speech encoding. For this reason, this study sought to investigate cortical temporal response functions in 14 children who stutter (3-7 years of age) compared to 13 normally fluent peers. EEG data were recorded as participants encoded natural speech during a dichotic listening task. When comparing between groups, the results provide evidence that children who stutter experience significantly weaker cortical tracking for unattended speech and more efficient cortical tracking for attended speech, suggesting that phonological processing is atypical at the level of speech envelope encoding. Considering these findings, we propose that children who stutter may be increasing cognitive effort during speech and language processing, in order to compensate for an atypical phonological processing mechanism

    Gender detection in children’s speech utterances for human-robot interaction

    Get PDF
    The human voice speech essentially includes paralinguistic information used in many real-time applications. Detecting the children’s gender is considered a challenging task compared to the adult’s gender. In this study, a system for human-robot interaction (HRI) is proposed to detect the gender in children’s speech utterances without depending on the text. The robot's perception includes three phases: Feature’s extraction phase where four formants are measured at each glottal pulse and then a median is calculated across these measurements. After that, three types of features are measured which are formant average (AF), formant dispersion (DF), and formant position (PF). Feature’s standardization phase where the measured feature dimensions are standardized using the z-score method. The semantic understanding phase is where the children’s gender is detected accurately using the logistic regression classifier. At the same time, the action of the robot is specified via a speech response using the text to speech (TTS) technique. Experiments are conducted on the Carnegie Mellon University (CMU) Kids dataset to measure the suggested system’s performance. In the suggested system, the overall accuracy is 98%. The results show a relatively clear improvement in terms of accuracy of up to 13% compared to related works that utilized the CMU Kids dataset
    • 

    corecore