39,276 research outputs found

    Enhancing Child Vocalization Classification in Multi-Channel Child-Adult Conversations Through Wav2vec2 Children ASR Features

    Full text link
    Autism Spectrum Disorder (ASD) is a neurodevelopmental disorder that often emerges in early childhood. ASD assessment typically involves an observation protocol including note-taking and ratings of child's social behavior conducted by a trained clinician. A robust machine learning (ML) model that is capable of labeling adult and child audio has the potential to save significant time and labor in manual coding children's behaviors. This may assist clinicians capture events of interest, better communicate events with parents, and educate new clinicians. In this study, we leverage the self-supervised learning model, Wav2Vec 2.0 (W2V2), pretrained on 4300h of home recordings of children under 5 years old, to build a unified system that performs both speaker diarization (SD) and vocalization classification (VC) tasks. We apply this system to two-channel audio recordings of brief 3-5 minute clinician-child interactions using the Rapid-ABC corpus. We propose a novel technique by introducing auxiliary features extracted from W2V2-based automatic speech recognition (ASR) system for children under 4 years old to improve children's VC task. We test our proposed method of improving children's VC task on two corpora (Rapid-ABC and BabbleCor) and observe consistent improvements. Furthermore, we reach, or perhaps outperform, the state-of-the-art performance of BabbleCor.Comment: Submitted to ICASSP 202

    The development of emotion recognition from facial expressions and non-linguistic vocalizations during childhood

    Get PDF
    Sensitivity to facial and vocal emotion is fundamental to children's social competence. Previous research has focused on children's facial emotion recognition, and few studies have investigated non-linguistic vocal emotion processing in childhood. We compared facial and vocal emotion recognition and processing biases in 4- to 11-year-olds and adults. Eighty-eight 4- to 11-year-olds and 21 adults participated. Participants viewed/listened to faces and voices (angry, happy, and sad) at three intensity levels (50%, 75%, and 100%). Non-linguistic tones were used. For each modality, participants completed an emotion identification task. Accuracy and bias for each emotion and modality were compared across 4- to 5-, 6- to 9- and 10- to 11-year-olds and adults. The results showed that children's emotion recognition improved with age; preschoolers were less accurate than other groups. Facial emotion recognition reached adult levels by 11 years, whereas vocal emotion recognition continued to develop in late childhood. Response bias decreased with age. For both modalities, sadness recognition was delayed across development relative to anger and happiness. The results demonstrate that developmental trajectories of emotion processing differ as a function of emotion type and stimulus modality. In addition, vocal emotion processing showed a more protracted developmental trajectory, compared to facial emotion processing. The results have important implications for programmes aiming to improve children's socio-emotional competence

    Imagining an ideal school for wellbeing: Locating student voice

    Get PDF
    ePublications@SCU is an electronic repository administered by Southern Cross University Library. Its goal is to capture and preserve the intellectual output of Southern Cross University authors and researchers, and to increase visibility and impact through open access to researchers around the world. For further information please contac

    The impact of sound field systems on learning and attention in elementary school classrooms

    Get PDF
    Purpose: An evaluation of the installation and use of sound field systems (SFS) was carried out to investigate their impact on teaching and learning in elementary school classrooms. Methods: The evaluation included acoustic surveys of classrooms, questionnaire surveys of students and teachers and experimental testing of students with and without the use of SFS. Students ’ perceptions of classroom environments and objective data evaluating change in performance on cognitive and academic assessments with amplification over a six month period are reported. Results: Teachers were positive about the use of SFS in improving children’s listening and attention to verbal instructions. Over time students in amplified classrooms did not differ from those in nonamplified classrooms in their reports of listening conditions, nor did their performance differ in measures of numeracy, reading or spelling. Use of SFS in the classrooms resulted in significantly larger gains in performance in the number of correct items on the nonverbal measure of speed of processing and the measure of listening comprehension. Analysis controlling for classroom acoustics indicated that students ’ listening comprehension score

    Child development and the aims of road safety education

    Get PDF
    Pedestrian accidents are one of the most prominent causes of premature injury, handicap and death in the modern world. In children, the problem is so severe that pedestrian accidents are widely regarded as the most serious of all health risks facing children in developed countries. Not surprisingly, educational measures have long been advocated as a means of teaching children how to cope with traffic and substantial resources have been devoted to their development and provision. Unfortunately, there seems to be a widespread view at the present time that education has not achieved as much as had been hoped and that there may even be quite strict limits to what can be achieved through education. This would, of course, shift the emphasis away from education altogether towards engineering or urban planning measures aimed at creating an intrinsically safer environment in which the need for education might be reduced or even eliminated. However, whilst engineering measures undoubtedly have a major role to play in the effort to reduce accidents, this outlook is both overly optimistic about the benefits of engineering and overly pessimistic about the limitations of education. At the same time, a fresh analysis is clearly required both of the aims and methods of contemporary road safety education. The present report is designed to provide such an analysis and to establish a framework within which further debate and research can take place

    Spelling instruction through etymology: A method of developing spelling lists for older students

    Get PDF
    The purpose of this study was to investigate whether an approach to developing word lists centred on etymological roots would improve the spelling performance of older primary school students. Participants were 46 students in the last year of primary school in south-east Queensland (31 girls and 15 boys) across three classes, with two classes being assigned to control conditions. Students were evaluated pre- and post-intervention on three dependent measures: British Spelling Test Series spelling, spelling in writing and writing. The results of this intervention revealed improvements in spelling for girls but not for boys. The implications for improved teaching methods are discussed

    West Virginia Oral Health Initiative, Executive Summary

    Get PDF
    The West Virginia Oral Health Initiative began in 2008 and is anchored in improving the oral health status of West Virginia residents through public awareness, provider training, dental screenings, and access to dental care. The initiative began in 2008 and is anchored in improving the oral health status of West Virginia residents through public awareness, provider training, dental screenings, and access to dental care.In March 2015, leaders of the initiative and representatives of The Benedum Foundation gathered to discuss lessons learned, the road ahead, and how both parties could improve their effectiveness. What we know is that successful collaboratives are about leveraging resources, knowledge and collective will to achieve an endwith the good fortune of timing, funding and leadership urging them onward. The West Virginia Oral Health Initiative is an example of that formula. It also provides fruitful ground to examine the expansive role of The Benedum Foundation in launching this statewide effort, guiding the work, and positioning the initiative for support by other funders

    Adaptation of Whisper models to child speech recognition

    Full text link
    Automatic Speech Recognition (ASR) systems often struggle with transcribing child speech due to the lack of large child speech datasets required to accurately train child-friendly ASR models. However, there are huge amounts of annotated adult speech datasets which were used to create multilingual ASR models, such as Whisper. Our work aims to explore whether such models can be adapted to child speech to improve ASR for children. In addition, we compare Whisper child-adaptations with finetuned self-supervised models, such as wav2vec2. We demonstrate that finetuning Whisper on child speech yields significant improvements in ASR performance on child speech, compared to non finetuned Whisper models. Additionally, utilizing self-supervised Wav2vec2 models that have been finetuned on child speech outperforms Whisper finetuning.Comment: Accepted in Interspeech 202

    The right information may matter more than frequency-place alignment: Simulations of frequency-aligned and upward shifting cochlear implant processors for a shallow electrode array insertion

    Get PDF
    Objective: It has been claimed that speech recognition with a cochlear implant is dependent on the correct frequency alignment of analysis bands in the speech processor with characteristic frequencies (CFs) at electrode locations. However, the use of filters aligned in frequency to a relatively basal electrode array position leads to significant loss of lower frequency speech information. This study uses an acoustic simulation to compare two approaches to the matching of speech processor filters to an electrode array having a relatively shallow depth within the typical range, such that the most apical element is at a CF of 1851 Hz. Two noise-excited vocoder speech processors are compared, one with CF-matched filters, and one with filters matched to CFs at basilar membrane locations 6 mm more apical than electrode locations.Design: An extended crossover training design examined pre- and post-training performance in the identification of vowels and words in sentences for both processors. Subjects received about 3 hours of training with each processor in turn.Results: Training improved performance with both processors, but training effects were greater for the shifted processor. For a male talker, the shifted processor led to higher post-training scores than the frequency-aligned processor with both vowels and sentences. For a female talker, post-training vowel scores did not differ significantly between processors, whereas sentence scores were higher with the frequency-aligned processor.Conclusions: Even for a shallow electrode insertion, we conclude that a speech processor should represent information from important frequency regions below 1 kHz and that the possible cost of frequency misalignment can be significantly reduced with listening experience
    • …
    corecore