8,899 research outputs found
Using State-of-the-Art Speech Models to Evaluate Oral Reading Fluency in Ghana
This paper reports on a set of three recent experiments utilizing large-scale
speech models to evaluate the oral reading fluency (ORF) of students in Ghana.
While ORF is a well-established measure of foundational literacy, assessing it
typically requires one-on-one sessions between a student and a trained
evaluator, a process that is time-consuming and costly. Automating the
evaluation of ORF could support better literacy instruction, particularly in
education contexts where formative assessment is uncommon due to large class
sizes and limited resources. To our knowledge, this research is among the first
to examine the use of the most recent versions of large-scale speech models
(Whisper V2 wav2vec2.0) for ORF assessment in the Global South.
We find that Whisper V2 produces transcriptions of Ghanaian students reading
aloud with a Word Error Rate of 13.5. This is close to the model's average WER
on adult speech (12.8) and would have been considered state-of-the-art for
children's speech transcription only a few years ago. We also find that when
these transcriptions are used to produce fully automated ORF scores, they
closely align with scores generated by expert human graders, with a correlation
coefficient of 0.96. Importantly, these results were achieved on a
representative dataset (i.e., students with regional accents, recordings taken
in actual classrooms), using a free and publicly available speech model out of
the box (i.e., no fine-tuning). This suggests that using large-scale speech
models to assess ORF may be feasible to implement and scale in lower-resource,
linguistically diverse educational contexts
Access to recorded interviews: A research agenda
Recorded interviews form a rich basis for scholarly inquiry. Examples include oral histories, community memory projects, and interviews conducted for broadcast media. Emerging technologies offer the potential to radically transform the way in which recorded interviews are made accessible, but this vision will demand substantial investments from a broad range of research communities. This article reviews the present state of practice for making recorded interviews available and the state-of-the-art for key component technologies. A large number of important research issues are identified, and from that set of issues, a coherent research agenda is proposed
Computational Language Assessment in patients with speech, language, and communication impairments
Speech, language, and communication symptoms enable the early detection,
diagnosis, treatment planning, and monitoring of neurocognitive disease
progression. Nevertheless, traditional manual neurologic assessment, the speech
and language evaluation standard, is time-consuming and resource-intensive for
clinicians. We argue that Computational Language Assessment (C.L.A.) is an
improvement over conventional manual neurological assessment. Using machine
learning, natural language processing, and signal processing, C.L.A. provides a
neuro-cognitive evaluation of speech, language, and communication in elderly
and high-risk individuals for dementia. ii. facilitates the diagnosis,
prognosis, and therapy efficacy in at-risk and language-impaired populations;
and iii. allows easier extensibility to assess patients from a wide range of
languages. Also, C.L.A. employs Artificial Intelligence models to inform theory
on the relationship between language symptoms and their neural bases. It
significantly advances our ability to optimize the prevention and treatment of
elderly individuals with communication disorders, allowing them to age
gracefully with social engagement.Comment: 36 pages, 2 figures, to be submite
- âŚ