39,276 research outputs found
Enhancing Child Vocalization Classification in Multi-Channel Child-Adult Conversations Through Wav2vec2 Children ASR Features
Autism Spectrum Disorder (ASD) is a neurodevelopmental disorder that often
emerges in early childhood. ASD assessment typically involves an observation
protocol including note-taking and ratings of child's social behavior conducted
by a trained clinician. A robust machine learning (ML) model that is capable of
labeling adult and child audio has the potential to save significant time and
labor in manual coding children's behaviors. This may assist clinicians capture
events of interest, better communicate events with parents, and educate new
clinicians. In this study, we leverage the self-supervised learning model,
Wav2Vec 2.0 (W2V2), pretrained on 4300h of home recordings of children under 5
years old, to build a unified system that performs both speaker diarization
(SD) and vocalization classification (VC) tasks. We apply this system to
two-channel audio recordings of brief 3-5 minute clinician-child interactions
using the Rapid-ABC corpus. We propose a novel technique by introducing
auxiliary features extracted from W2V2-based automatic speech recognition (ASR)
system for children under 4 years old to improve children's VC task. We test
our proposed method of improving children's VC task on two corpora (Rapid-ABC
and BabbleCor) and observe consistent improvements. Furthermore, we reach, or
perhaps outperform, the state-of-the-art performance of BabbleCor.Comment: Submitted to ICASSP 202
The development of emotion recognition from facial expressions and non-linguistic vocalizations during childhood
Sensitivity to facial and vocal emotion is fundamental to children's social competence. Previous research has focused on children's facial emotion recognition, and few studies have investigated non-linguistic vocal emotion processing in childhood. We compared facial and vocal emotion recognition and processing biases in 4- to 11-year-olds and adults. Eighty-eight 4- to 11-year-olds and 21 adults participated. Participants viewed/listened to faces and voices (angry, happy, and sad) at three intensity levels (50%, 75%, and 100%). Non-linguistic tones were used. For each modality, participants completed an emotion identification task. Accuracy and bias for each emotion and modality were compared across 4- to 5-, 6- to 9- and 10- to 11-year-olds and adults. The results showed that children's emotion recognition improved with age; preschoolers were less accurate than other groups. Facial emotion recognition reached adult levels by 11 years, whereas vocal emotion recognition continued to develop in late childhood. Response bias decreased with age. For both modalities, sadness recognition was delayed across development relative to anger and happiness. The results demonstrate that developmental trajectories of emotion processing differ as a function of emotion type and stimulus modality. In addition, vocal emotion processing showed a more protracted developmental trajectory, compared to facial emotion processing. The results have important implications for programmes aiming to improve children's socio-emotional competence
Imagining an ideal school for wellbeing: Locating student voice
ePublications@SCU is an electronic repository administered by Southern Cross University Library. Its goal is to capture and preserve the intellectual output of Southern Cross University authors and researchers, and to increase visibility and impact through open access to researchers around the world. For further information please contac
The impact of sound field systems on learning and attention in elementary school classrooms
Purpose: An evaluation of the installation and use of sound field systems (SFS) was carried out to investigate their impact on teaching and learning in elementary school classrooms. Methods: The evaluation included acoustic surveys of classrooms, questionnaire surveys of students and teachers and experimental testing of students with and without the use of SFS. Students ’ perceptions of classroom environments and objective data evaluating change in performance on cognitive and academic assessments with amplification over a six month period are reported. Results: Teachers were positive about the use of SFS in improving children’s listening and attention to verbal instructions. Over time students in amplified classrooms did not differ from those in nonamplified classrooms in their reports of listening conditions, nor did their performance differ in measures of numeracy, reading or spelling. Use of SFS in the classrooms resulted in significantly larger gains in performance in the number of correct items on the nonverbal measure of speed of processing and the measure of listening comprehension. Analysis controlling for classroom acoustics indicated that students ’ listening comprehension score
Child development and the aims of road safety education
Pedestrian accidents are one of the most prominent causes of premature injury, handicap and death in the modern world. In children, the problem is so severe that pedestrian accidents are widely regarded as the most serious of all health risks facing children in developed countries. Not surprisingly, educational measures have long been advocated as a means of teaching children how to cope with traffic and substantial resources have been devoted to their development and provision. Unfortunately, there seems to be a widespread view at the present time that education has not achieved as much as had been hoped and that there may even be quite strict limits to what can be achieved through education. This would, of course, shift the emphasis away from education altogether towards engineering or urban planning measures aimed at creating an intrinsically safer environment in which the need for education might be reduced or even eliminated. However, whilst engineering measures undoubtedly have a major role to play in the effort to reduce accidents, this outlook is both overly optimistic about the benefits of engineering and overly pessimistic about the limitations of education. At the same time, a fresh analysis is clearly required both of the aims and methods of contemporary road safety education. The present report is designed to provide such an analysis and to establish a framework within which further debate and research can take place
Spelling instruction through etymology: A method of developing spelling lists for older students
The purpose of this study was to investigate whether an approach to developing word lists centred on etymological roots would improve the spelling performance of older primary school students. Participants were 46 students in the last year of primary school in south-east Queensland (31 girls and 15 boys) across three classes, with two classes being assigned to control conditions. Students were evaluated pre- and post-intervention on three dependent measures: British Spelling Test Series spelling, spelling in writing and writing. The results of this intervention revealed improvements in spelling for girls but not for boys. The implications for improved teaching methods are discussed
West Virginia Oral Health Initiative, Executive Summary
The West Virginia Oral Health Initiative began in 2008 and is anchored in improving the oral health status of West Virginia residents through public awareness, provider training, dental screenings, and access to dental care. The initiative began in 2008 and is anchored in improving the oral health status of West Virginia residents through public awareness, provider training, dental screenings, and access to dental care.In March 2015, leaders of the initiative and representatives of The Benedum Foundation gathered to discuss lessons learned, the road ahead, and how both parties could improve their effectiveness. What we know is that successful collaboratives are about leveraging resources, knowledge and collective will to achieve an endwith the good fortune of timing, funding and leadership urging them onward. The West Virginia Oral Health Initiative is an example of that formula. It also provides fruitful ground to examine the expansive role of The Benedum Foundation in launching this statewide effort, guiding the work, and positioning the initiative for support by other funders
Adaptation of Whisper models to child speech recognition
Automatic Speech Recognition (ASR) systems often struggle with transcribing
child speech due to the lack of large child speech datasets required to
accurately train child-friendly ASR models. However, there are huge amounts of
annotated adult speech datasets which were used to create multilingual ASR
models, such as Whisper. Our work aims to explore whether such models can be
adapted to child speech to improve ASR for children. In addition, we compare
Whisper child-adaptations with finetuned self-supervised models, such as
wav2vec2. We demonstrate that finetuning Whisper on child speech yields
significant improvements in ASR performance on child speech, compared to non
finetuned Whisper models. Additionally, utilizing self-supervised Wav2vec2
models that have been finetuned on child speech outperforms Whisper finetuning.Comment: Accepted in Interspeech 202
The right information may matter more than frequency-place alignment: Simulations of frequency-aligned and upward shifting cochlear implant processors for a shallow electrode array insertion
Objective: It has been claimed that speech recognition with a cochlear implant is dependent on the correct frequency alignment of analysis bands in the speech processor with characteristic frequencies (CFs) at electrode locations. However, the use of filters aligned in frequency to a relatively basal electrode array position leads to significant loss of lower frequency speech information. This study uses an acoustic simulation to compare two approaches to the matching of speech processor filters to an electrode array having a relatively shallow depth within the typical range, such that the most apical element is at a CF of 1851 Hz. Two noise-excited vocoder speech processors are compared, one with CF-matched filters, and one with filters matched to CFs at basilar membrane locations 6 mm more apical than electrode locations.Design: An extended crossover training design examined pre- and post-training performance in the identification of vowels and words in sentences for both processors. Subjects received about 3 hours of training with each processor in turn.Results: Training improved performance with both processors, but training effects were greater for the shifted processor. For a male talker, the shifted processor led to higher post-training scores than the frequency-aligned processor with both vowels and sentences. For a female talker, post-training vowel scores did not differ significantly between processors, whereas sentence scores were higher with the frequency-aligned processor.Conclusions: Even for a shallow electrode insertion, we conclude that a speech processor should represent information from important frequency regions below 1 kHz and that the possible cost of frequency misalignment can be significantly reduced with listening experience
- …