2 research outputs found

    Alzheimer's Dementia Recognition Using Acoustic, Lexical, Disfluency and Speech Pause Features Robust to Noisy Inputs

    Full text link
    We present two multimodal fusion-based deep learning models that consume ASR transcribed speech and acoustic data simultaneously to classify whether a speaker in a structured diagnostic task has Alzheimer's Disease and to what degree, evaluating the ADReSSo challenge 2021 data. Our best model, a BiLSTM with highway layers using words, word probabilities, disfluency features, pause information, and a variety of acoustic features, achieves an accuracy of 84% and RSME error prediction of 4.26 on MMSE cognitive scores. While predicting cognitive decline is more challenging, our models show improvement using the multimodal approach and word probabilities, disfluency and pause information over word-only models. We show considerable gains for AD classification using multimodal fusion and gating, which can effectively deal with noisy inputs from acoustic features and ASR hypotheses.Comment: INTERSPEECH 2021. arXiv admin note: substantial text overlap with arXiv:2106.0966

    Influences on Expert Intelligibility Judgments of School-age Children's Speech

    Get PDF
    Speech-language pathologists (SLPs) make impressionistic intelligibility judgments as part of an evaluation of children for speech sound disorders. Despite the lack of formalization, it is an important measure of choice for SLPs, going beyond single-word standardized measures by using spontaneous speech to assess functional communication. However, spontaneous speech introduces sources of error and bias in the listener. This dissertation argues that impressionistic intelligibility judgments are influenced by listener-dependent factors due to their subjectivity. To identify potential sources of error and bias, speech data were collected from four school-aged child groups: typically developing monolingual, children with speech sound disorder, typically developing Spanish-English bilingual (i.e., an accent familiar to the study’s listeners), and typically developing Mam-English bilingual (i.e., an accent unfamiliar to the study’s listeners), in two school-age groups. Perceiver data were collected from two listener groups (i.e., expert [SLP] and lay). Listeners provided baseline measurements of lab-based intelligibility scores and comprehensibility ratings by orthographically transcribing and rating audio recordings of experimentally controlled utterances. Listeners also made impressionistic global intelligibility assessments after viewing video recordings of children’s spontaneous speech. Findings showed differences between expert’s and lay listener’s global intelligibility assessments however experts were no better than lay listeners at discerning between age and speaker groups. Of the four speaker groups, there was a significant effect of the Mam-English bilingual speaker group on global intelligibility assessments. Relationships were found between global intelligibility assessments and both the lab-based intelligibility measure and the comprehensibility rating, indicating impressionistic judgments tap into both speech signal features and the understandability of speech. Surprisingly, the age and linguistic ability of the child speakers were not significant factors on global intelligibility assessments, so perhaps listeners were making accommodations for these differences in their assessments. These findings indicate the need for increased training of SLPs to reduce error and bias in their speech intelligibility judgments, as well as the need for further research to improve its objectivity
    corecore