26,358 research outputs found

    Frequency domain analysis of MFCC feature extraction in children’s speech recognition system

    Get PDF
    Abstract —The research on speech recognition systems currently focuses on the analysis of robust speech recognition systems. When the speech signals are combined with noise, the recognition system becomes distracted, struggling to identify the speech sounds. Therefore, the development of a robust speech recognition system continues to be carried out. The principle of a robust speech recognition system is to eliminate noise from the speech signals and restore the original information signals. In this paper, researchers conducted a frequency domain analysis on one stage of the Mel Frequency Cepstral Coefficients (MFCC) process, the Fast Fourier Transform (FFT), in children's speech recognition system. The FTT analysis in the feature extraction process determined the effect of frequency value characteristics utilized in the FFT output on the noise disruption. The analysis method was designed into three scenarios based on the value of the employed FFT points. The differences between scenarios were based on the number of shared FFT points. All FFT points were divided into four, three, and two parts in the first, second, and third scenarios, respectively. This study utilized children's speech data from the isolated TIDIGIT English digit corpus. As comparative data, the noise was added manually to simulate real-world conditions. The results showed that using a particular frequency portion following the scenario designed on MFCC affected the recognition system performance, which was relatively significant on the noisy speech data. The designed method in the scenario 3 (C1) version generated the highest accuracy, exceeded the accuracy of the conventional MFCC method. The average accuracy in the scenario 3 (C1) method increased by 1% more than all the tested noise types. Using various noise intensity values (SNR), the testing process indicates that scenario 3 (C1) generates a higher accuracy than conventional MFCC in all tested SNR values. It proves that the selection of specific frequency utilized in MFCC feature extraction significantly affects the recognition accuracy in a noisy speech

    Some Neurocognitive Correlates of Noise-Vocoded Speech Perception in Children With Normal Hearing: A Replication and Extension of )

    Get PDF
    OBJECTIVES: Noise-vocoded speech is a valuable research tool for testing experimental hypotheses about the effects of spectral degradation on speech recognition in adults with normal hearing (NH). However, very little research has utilized noise-vocoded speech with children with NH. Earlier studies with children with NH focused primarily on the amount of spectral information needed for speech recognition without assessing the contribution of neurocognitive processes to speech perception and spoken word recognition. In this study, we first replicated the seminal findings reported by ) who investigated effects of lexical density and word frequency on noise-vocoded speech perception in a small group of children with NH. We then extended the research to investigate relations between noise-vocoded speech recognition abilities and five neurocognitive measures: auditory attention (AA) and response set, talker discrimination, and verbal and nonverbal short-term working memory. DESIGN: Thirty-one children with NH between 5 and 13 years of age were assessed on their ability to perceive lexically controlled words in isolation and in sentences that were noise-vocoded to four spectral channels. Children were also administered vocabulary assessments (Peabody Picture Vocabulary test-4th Edition and Expressive Vocabulary test-2nd Edition) and measures of AA (NEPSY AA and response set and a talker discrimination task) and short-term memory (visual digit and symbol spans). RESULTS: Consistent with the findings reported in the original ) study, we found that children perceived noise-vocoded lexically easy words better than lexically hard words. Words in sentences were also recognized better than the same words presented in isolation. No significant correlations were observed between noise-vocoded speech recognition scores and the Peabody Picture Vocabulary test-4th Edition using language quotients to control for age effects. However, children who scored higher on the Expressive Vocabulary test-2nd Edition recognized lexically easy words better than lexically hard words in sentences. Older children perceived noise-vocoded speech better than younger children. Finally, we found that measures of AA and short-term memory capacity were significantly correlated with a child's ability to perceive noise-vocoded isolated words and sentences. CONCLUSIONS: First, we successfully replicated the major findings from the ) study. Because familiarity, phonological distinctiveness and lexical competition affect word recognition, these findings provide additional support for the proposal that several foundational elementary neurocognitive processes underlie the perception of spectrally degraded speech. Second, we found strong and significant correlations between performance on neurocognitive measures and children's ability to recognize words and sentences noise-vocoded to four spectral channels. These findings extend earlier research suggesting that perception of spectrally degraded speech reflects early peripheral auditory processes, as well as additional contributions of executive function, specifically, selective attention and short-term memory processes in spoken word recognition. The present findings suggest that AA and short-term memory support robust spoken word recognition in children with NH even under compromised and challenging listening conditions. These results are relevant to research carried out with listeners who have hearing loss, because they are routinely required to encode, process, and understand spectrally degraded acoustic signals

    Enhancing Child Vocalization Classification in Multi-Channel Child-Adult Conversations Through Wav2vec2 Children ASR Features

    Full text link
    Autism Spectrum Disorder (ASD) is a neurodevelopmental disorder that often emerges in early childhood. ASD assessment typically involves an observation protocol including note-taking and ratings of child's social behavior conducted by a trained clinician. A robust machine learning (ML) model that is capable of labeling adult and child audio has the potential to save significant time and labor in manual coding children's behaviors. This may assist clinicians capture events of interest, better communicate events with parents, and educate new clinicians. In this study, we leverage the self-supervised learning model, Wav2Vec 2.0 (W2V2), pretrained on 4300h of home recordings of children under 5 years old, to build a unified system that performs both speaker diarization (SD) and vocalization classification (VC) tasks. We apply this system to two-channel audio recordings of brief 3-5 minute clinician-child interactions using the Rapid-ABC corpus. We propose a novel technique by introducing auxiliary features extracted from W2V2-based automatic speech recognition (ASR) system for children under 4 years old to improve children's VC task. We test our proposed method of improving children's VC task on two corpora (Rapid-ABC and BabbleCor) and observe consistent improvements. Furthermore, we reach, or perhaps outperform, the state-of-the-art performance of BabbleCor.Comment: Submitted to ICASSP 202

    Children's naming and word-finding difficulties: descriptions and explanations

    Get PDF
    Purpose: There are a substantial minority of children for whom lexical retrieval problems impede the normal pattern of language development and use. These problems include accurately producing the correct word even when the word?s meaning is understood; such children are often referred to as having word-finding difficulties (WFDs). This review examines the nature of naming and lexical retrieval difficulties in these and other groups of children. Method: A review of the relevant literature on lexical access difficulties in children with word finding difficulties was conducted. Studies were examined in the terms of population parameters and comparison groups included in the study. Results and Conclusions: Most discussions of the cognitive processes causing lexical retrieval difficulties have referred to semantics, phonology and processing speed. It is argued that our understanding of these topics will be further advanced by the use of appropriate methodology to test developmental models that both identify the processes in successfully performing different lexical retrieval tasks and more precisely locating the difficulties experienced by children with such tasks

    Government response to the Justice Committee's Seventh Report of Session 2012-13 : youth justice / for Justice

    Get PDF

    A computational simulation of children's performance across three nonword repetition tests

    Get PDF
    The nonword repetition test has been regularly used to examine children’s vocabulary acquisition, and yet there is no clear explanation of all of the effects seen in nonword repetition. This paper presents a study of 5-6 year-old children’s repetition performance on three nonword repetition tests that vary in the degree of their lexicality. EPAM-VOC, a model of children’s vocabulary acquisition, is then presented that captures the children’s performance in all three repetition tests. The model represents a clear explanation of how working memory and long-term lexical and sub-lexical knowledge interact in a way that is able to simulate repetition performance across three nonword tests within the same model and without the need for test specific parameter settings
    • …
    corecore