38 research outputs found

    Alzheimer’s Dementia Recognition Through Spontaneous Speech

    Get PDF

    Multimodal Assessment of Cognitive Decline: Applications in Alzheimer’s Disease and Depression

    Get PDF
    The initial diagnosis and assessment of cognitive decline are generally based around the judgement of clinicians, and commonly used semi-structured interviews, guided by pre-determined sets of topics, in a clinical set-up. Publicly available multimodal datasets have provided an opportunity to explore a range of experiments in the automatic detecting of cognitive decline. Drawing on the latest developments in representation learning, machine learning, and natural language processing, we seek to develop models capable of identifying cognitive decline with an eye to discovering the differences and commonalities that should be considered in computational treatment of mental health disorders. We present models that learn the indicators of cognitive decline from audio and visual modalities as well as lexical, syntactic, disfluency and pause information. Our study is carried out in two parts: moderation analysis and predictive modelling. We do some experiments with different fusion techniques. Our approaches are motivated by some of the recent efforts in multimodal fusion for classifying cognitive states to capture the interaction between modalities and maximise the use and combination of each modality. We create tools for detecting cognitive decline and use them to analyze three major datasets containing speech produced by people with and without cognitive decline. These findings are being used to develop multimodal models for the detection of depression and Alzheimer’s dementia

    An investigation into interactional patterns for Alzheimer's Disease recognition in Natural dialogues

    Get PDF
    Alzheimer's disease (AD) is a complex neurodegenerative disorder characterized by memory loss, together with cognitive deficits affecting language, emotional affect, and interactional communication. Diagnosis and assessment of AD is formally based on the judgment of clinicians, commonly using semi-structured interviews in a clinical setting. Manual diagnosis is therefore slow, resource-heavy, and hard to access, so many people don't get diagnosed - and therefore using some kind of automatic method would help. Using the most recent advances in deep learning, machine learning, and natural language processing, this thesis empirically explores how content-free, interaction patterns are helpful in developing models capable of identifying AD from natural conversations with a focus on particular phenomena found useful in conversational analysis studies. The models presented in this thesis use lexical, disfluency, interactional, acoustic, and pause information to learn the symptoms of Alzheimer's disease from text and audio modalities. This thesis comprises two parts. In the first part, by studying a conversational corpus, we find there are certain phenomena that are really strongly indicative of differences between AD and Non-AD. This analysis shows that interaction patterns are different between an AD patient and a Non-AD patient, including types of questions asked from patients, their responses, delay in responses in the form of pauses, clarification questions, signaling non-understanding, and repetition of questions. Although it is a challenging problem due to the fact that these dialogue acts are so rare, we show that it is possible to develop models that can automatically detect these classes. The second part then shifts to look at AD diagnosis itself by looking into interactional features including pause information, disfluencies within patients speech, communication breakdowns at speaker changes in certain situations, Ngram dialogue act sequences. We found out that there are longer pauses within the AD patients utterances and more attributable silences in response to questions as compared to Non-AD patients. It also showed that using different fusion techniques with speech and text modality has maximise the combination and use of different feature sets showing that these features/techniques can give quite good accurate and effective AD diagnosis. These interaction patterns may serve as an index of internal cognitive processes that help in differentiating AD patients and Non-AD patients and may be used as an integral part of language assessment in clinical settings

    Temporal Integration of Text Transcripts and Acoustic Features for Alzheimer's Diagnosis Based on Spontaneous Speech

    Get PDF
    Background: Advances in machine learning (ML) technology have opened new avenues for detection and monitoring of cognitive decline. In this study, a multimodal approach to Alzheimer's dementia detection based on the patient's spontaneous speech is presented. This approach was tested on a standard, publicly available Alzheimer's speech dataset for comparability. The data comprise voice samples from 156 participants (1:1 ratio of Alzheimer's to control), matched by age and gender. Materials and Methods: A recently developed Active Data Representation (ADR) technique for voice processing was employed as a framework for fusion of acoustic and textual features at sentence and word level. Temporal aspects of textual features were investigated in conjunction with acoustic features in order to shed light on the temporal interplay between paralinguistic (acoustic) and linguistic (textual) aspects of Alzheimer's speech. Combinations between several configurations of ADR features and more traditional bag-of-n-grams approaches were used in an ensemble of classifiers built and evaluated on a standardised dataset containing recorded speech of scene descriptions and textual transcripts. Results: Employing only semantic bag-of-n-grams features, an accuracy of 89.58% was achieved in distinguishing between Alzheimer's patients and healthy controls. Adding temporal and structural information by combining bag-of-n-grams features with ADR audio/textual features, the accuracy could be improved to 91.67% on the test set. An accuracy of 93.75% was achieved through late fusion of the three best feature configurations, which corresponds to a 4.7% improvement over the best result reported in the literature for this dataset. Conclusion: The proposed combination of ADR audio and textual features is capable of successfully modelling temporal aspects of the data. The machine learning approach toward dementia detection achieves best performance when ADR features are combined with strong semantic bag-of-n-grams features. This combination leads to state-of-the-art performance on the AD classification task

    Automating Intended Target Identification for Paraphasias in Discourse using a large language model

    Get PDF
    Purpose: To date, there are no automated tools for the identification and fine-grained classification of paraphasias within discourse, the production of which is the hallmark characteristic of most people with aphasia (PWA). In this work, we fine-tune a large language model (LLM) to automatically predict paraphasia targets in Cinderella story retellings. Method: Data consisted of 332 Cinderella story retellings containing 2,489 paraphasias from PWA, for which research assistants identified their intended targets. We supplemented these training data with 256 sessions from control participants, to which we added 2,415 synthetic paraphasias. We conducted four experiments using different training data configurations to fine-tune the LLM to automatically “fill in the blank” of the paraphasia with a predicted target, given the context of the rest of the story retelling. We tested the experiments\u27 predictions against our human-identified targets and stratified our results by ambiguity of the targets and clinical factors. Results: The model trained on controls and PWA achieved 50.7% accuracy at exactly matching the human-identified target. Fine-tuning on PWA data, with or without controls, led to comparable performance. The model performed better on targets with less human ambiguity and on paraphasias from participants with fluent or less severe aphasia. Conclusions: We were able to automatically identify the intended target of paraphasias in discourse using just the surrounding language about half of the time. These findings take us a step closer to automatic aphasic discourse analysis. In future work, we will incorporate phonological information from the paraphasia to further improve predictive utility

    Linguistic- and Acoustic-based Automatic Dementia Detection using Deep Learning Methods

    Get PDF
    Dementia can affect a person's speech and language abilities, even in the early stages. Dementia is incurable, but early detection can enable treatment that can slow down and maintain mental function. Therefore, early diagnosis of dementia is of great importance. However, current dementia detection procedures in clinical practice are expensive, invasive, and sometimes inaccurate. In comparison, computational tools based on the automatic analysis of spoken language have the potential to be applied as a cheap, easy-to-use, and objective clinical assistance tool for dementia detection. In recent years, several studies have shown promise in this area. However, most studies focus heavily on the machine learning aspects and, as a consequence, often lack sufficient incorporation of clinical knowledge. Many studies also concentrate on clinically less relevant tasks such as the distinction between HC and people with AD which is relatively easy and therefore less interesting both in terms of the machine learning and the clinical application. The studies in this thesis concentrate on automatically identifying signs of neurodegenerative dementia in the early stages and distinguishing them from other clinical, diagnostic categories related to memory problems: (FMD, MCI, and HC). A key focus, when designing the proposed systems has been to better consider (and incorporate) currently used clinical knowledge and also to bear in mind how these machine-learning based systems could be translated for use in real clinical settings. Firstly, a state-of-the-art end-to-end system is constructed for extracting linguistic information from automatically transcribed spontaneous speech. The system's architecture is based on hierarchical principles thereby mimicking those used in clinical practice where information at both word-, sentence- and paragraph-level is used when extracting information to be used for diagnosis. Secondly, hand-crafted features are designed that are based on clinical knowledge of the importance of pausing and rhythm. These are successfully joined with features extracted from the end-to-end system. Thirdly, different classification tasks are explored, each set up so as to represent the types of diagnostic decision-making that is relevant in clinical practice. Finally, experiments are conducted to explore how to better deal with the known problem of confounding and overlapping symptoms on speech and language from age and cognitive decline. A multi-task system is constructed that takes age into account while predicting cognitive decline. The studies use the publicly available DementiaBank dataset as well as the IVA dataset, which has been collected by our collaborators at the Royal Hallamshire Hospital, UK. In conclusion, this thesis proposes multiple methods of using speech and language information for dementia detection with state-of-the-art deep learning technologies, confirming the automatic system's potential for dementia detection

    Management of Topic in the Spoken Discourse of Persons Living with Mild Cognitive Impairment and Alzheimer’s Dementia

    Get PDF
    Analyses of elicited spoken discourse can identify mild cognitive impairment (MCI) and Alzheimer’s dementia (AD). Topic management, one feature of discourse defined as acts that maintain or terminate an established topic, can be measured via global coherence measures. Little is known, however, about whether analyses of topic management can distinguish spoken discourse performances of persons living with MCI (PLwMCI) vs. persons living with AD (PLwAD). The current study investigated whether there are differences in topic management in the spoken discourse performances of PLwMCI vs. PLwAD. Analyses were conducted on 120 transcripts of spoken sequenced story picture descriptions of PLwMCI (n=83) and PLwAD (n=37). Diagnostic group performances were analyzed using average global coherence ratings. No significant group differences were found. Average global coherence ratings were not a predictor of diagnostic group membership. Findings highlight the need for further investigation of topic management in PLwMCI and PLwAD

    Detecting early signs of dementia in conversation

    Get PDF
    Dementia can affect a person's speech, language and conversational interaction capabilities. The early diagnosis of dementia is of great clinical importance. Recent studies using the qualitative methodology of Conversation Analysis (CA) demonstrated that communication problems may be picked up during conversations between patients and neurologists and that this can be used to differentiate between patients with Neuro-degenerative Disorders (ND) and those with non-progressive Functional Memory Disorder (FMD). However, conducting manual CA is expensive and difficult to scale up for routine clinical use.\ud This study introduces an automatic approach for processing such conversations which can help in identifying the early signs of dementia and distinguishing them from the other clinical categories (FMD, Mild Cognitive Impairment (MCI), and Healthy Control (HC)). The dementia detection system starts with a speaker diarisation module to segment an input audio file (determining who talks when). Then the segmented files are passed to an automatic speech recogniser (ASR) to transcribe the utterances of each speaker. Next, the feature extraction unit extracts a number of features (CA-inspired, acoustic, lexical and word vector) from the transcripts and audio files. Finally, a classifier is trained by the features to determine the clinical category of the input conversation. Moreover, we investigate replacing the role of a neurologist in the conversation with an Intelligent Virtual Agent (IVA) (asking similar questions). We show that despite differences between the IVA-led and the neurologist-led conversations, the results achieved by the IVA are as good as those gained by the neurologists. Furthermore, the IVA can be used for administering more standard cognitive tests, like the verbal fluency tests and produce automatic scores, which then can boost the performance of the classifier. The final blind evaluation of the system shows that the classifier can identify early signs of dementia with an acceptable level of accuracy and robustness (considering both sensitivity and specificity)

    Multimodal machine learning in medical screenings

    Get PDF
    The healthcare industry, with its high demand and standards, has long been considered a crucial area for technology-based innovation. However, the medical field often relies on experience-based evaluation. Limited resources, overloading capacity, and a lack of accessibility can hinder timely medical care and diagnosis delivery. In light of these challenges, automated medical screening as a decision-making aid is highly recommended. With the increasing availability of data and the need to explore the complementary effect among modalities, multimodal machine learning has emerged as a potential area of technology. Its impact has been witnessed across a wide range of domains, prompting the question of how far machine learning can be leveraged to automate processes in even more complex and high-risk sectors. This paper delves into the realm of multimodal machine learning in the field of automated medical screening and evaluates the potential of this area of study in mental disorder detection, a highly important area of healthcare. First, we conduct a scoping review targeted at high-impact papers to highlight the trends and directions of multimodal machine learning in screening prevalent mental disorders such as depression, stress, and bipolar disorder. The review provides a comprehensive list of popular datasets and extensively studied modalities. The review also proposes an end-to-end pipeline for multimodal machine learning applications, covering essential steps from preprocessing, representation, and fusion, to modelling and evaluation. While cross-modality interaction has been considered a promising factor to leverage fusion among multimodalities, the number of existing multimodal fusion methods employing this mechanism is rather limited. This study investigates multimodal fusion in more detail through the proposal of Autofusion, an autoencoder-infused fusion technique that harnesses the cross-modality interaction among different modalities. The technique is evaluated on DementiaBank’s Pitt corpus to detect Alzheimer’s disease, leveraging the power of cross-modality interaction. Autofusion achieves a promising performance of 79.89% in accuracy, 83.85% in recall, 81.72% in precision, and 82.47% in F1. The technique consistently outperforms all unimodal methods by an average of 5.24% across all metrics. Our method consistently outperforms early fusion and late fusion. Especially against the late fusion hard-voting technique, our method outperforms by an average of 20% across all metrics. Further, empirical results show that the cross-modality interaction term enhances the model performance by 2-3% across metrics. This research highlights the promising impact of cross-modality interaction in multimodal machine learning and calls for further research to unlock its full potential
    corecore