588 research outputs found

    Speech-based automatic depression detection via biomarkers identification and artificial intelligence approaches

    Get PDF
    Depression has become one of the most prevalent mental health issues, affecting more than 300 million people all over the world. However, due to factors such as limited medical resources and accessibility to health care, there are still a large number of patients undiagnosed. In addition, the traditional approaches to depression diagnosis have limitations because they are usually time-consuming, and depend on clinical experience that varies across different clinicians. From this perspective, the use of automatic depression detection can make the diagnosis process much faster and more accessible. In this thesis, we present the possibility of using speech for automatic depression detection. This is based on the findings in neuroscience that depressed patients have abnormal cognition mechanisms thus leading to the speech differs from that of healthy people. Therefore, in this thesis, we show two ways of benefiting from automatic depression detection, i.e., identifying speech markers of depression and constructing novel deep learning models to improve detection accuracy. The identification of speech markers tries to capture measurable depression traces left in speech. From this perspective, speech markers such as speech duration, pauses and correlation matrices are proposed. Speech duration and pauses take speech fluency into account, while correlation matrices represent the relationship between acoustic features and aim at capturing psychomotor retardation in depressed patients. Experimental results demonstrate that these proposed markers are effective at improving the performance in recognizing depressed speakers. In addition, such markers show statistically significant differences between depressed patients and non-depressed individuals, which explains the possibility of using these markers for depression detection and further confirms that depression leaves detectable traces in speech. In addition to the above, we propose an attention mechanism, Multi-local Attention (MLA), to emphasize depression-relevant information locally. Then we analyse the effectiveness of MLA on performance and efficiency. According to the experimental results, such a model can significantly improve performance and confidence in the detection while reducing the time required for recognition. Furthermore, we propose Cross-Data Multilevel Attention (CDMA) to emphasize different types of depression-relevant information, i.e., specific to each type of speech and common to both, by using multiple attention mechanisms. Experimental results demonstrate that the proposed model is effective to integrate different types of depression-relevant information in speech, improving the performance significantly for depression detection

    Spatio-Temporal AU Relational Graph Representation Learning For Facial Action Units Detection

    Full text link
    This paper presents our Facial Action Units (AUs) recognition submission to the fifth Affective Behavior Analysis in-the-wild Competition (ABAW). Our approach consists of three main modules: (i) a pre-trained facial representation encoder which produce a strong facial representation from each input face image in the input sequence; (ii) an AU-specific feature generator that specifically learns a set of AU features from each facial representation; and (iii) a spatio-temporal graph learning module that constructs a spatio-temporal graph representation. This graph representation describes AUs contained in all frames and predicts the occurrence of each AU based on both the modeled spatial information within the corresponding face and the learned temporal dynamics among frames. The experimental results show that our approach outperformed the baseline and the spatio-temporal graph representation learning allows our model to generate the best results among all ablated systems. Our model ranks at the 4th place in the AU recognition track at the 5th ABAW Competition

    Computer audition for emotional wellbeing

    Get PDF
    This thesis is focused on the application of computer audition (i. e., machine listening) methodologies for monitoring states of emotional wellbeing. Computer audition is a growing field and has been successfully applied to an array of use cases in recent years. There are several advantages to audio-based computational analysis; for example, audio can be recorded non-invasively, stored economically, and can capture rich information on happenings in a given environment, e. g., human behaviour. With this in mind, maintaining emotional wellbeing is a challenge for humans and emotion-altering conditions, including stress and anxiety, have become increasingly common in recent years. Such conditions manifest in the body, inherently changing how we express ourselves. Research shows these alterations are perceivable within vocalisation, suggesting that speech-based audio monitoring may be valuable for developing artificially intelligent systems that target improved wellbeing. Furthermore, computer audition applies machine learning and other computational techniques to audio understanding, and so by combining computer audition with applications in the domain of computational paralinguistics and emotional wellbeing, this research concerns the broader field of empathy for Artificial Intelligence (AI). To this end, speech-based audio modelling that incorporates and understands paralinguistic wellbeing-related states may be a vital cornerstone for improving the degree of empathy that an artificial intelligence has. To summarise, this thesis investigates the extent to which speech-based computer audition methodologies can be utilised to understand human emotional wellbeing. A fundamental background on the fields in question as they pertain to emotional wellbeing is first presented, followed by an outline of the applied audio-based methodologies. Next, detail is provided for several machine learning experiments focused on emotional wellbeing applications, including analysis and recognition of under-researched phenomena in speech, e. g., anxiety, and markers of stress. Core contributions from this thesis include the collection of several related datasets, hybrid fusion strategies for an emotional gold standard, novel machine learning strategies for data interpretation, and an in-depth acoustic-based computational evaluation of several human states. All of these contributions focus on ascertaining the advantage of audio in the context of modelling emotional wellbeing. Given the sensitive nature of human wellbeing, the ethical implications involved with developing and applying such systems are discussed throughout

    Science and Innovations for Food Systems Transformation

    Get PDF
    This Open Access book compiles the findings of the Scientific Group of the United Nations Food Systems Summit 2021 and its research partners. The Scientific Group was an independent group of 28 food systems scientists from all over the world with a mandate from the Deputy Secretary-General of the United Nations. The chapters provide science- and research-based, state-of-the-art, solution-oriented knowledge and evidence to inform the transformation of contemporary food systems in order to achieve more sustainable, equitable and resilient systems

    Measuring the Severity of Depression from Text using Graph Representation Learning

    Get PDF
    The common practice of psychology in measuring the severity of a patient's depressive symptoms is based on an interactive conversation between a clinician and the patient. In this dissertation, we focus on predicting a score representing the severity of depression from such a text. We first present a generic graph neural network (GNN) to automatically rate severity using patient transcripts. We also test a few sequence-based deep models in the same task. We then propose a novel form for node attributes within a GNN-based model that captures node-specific embedding for every word in the vocabulary. This provides a global representation of each node, coupled with node-level updates according to associations between words in a transcript. Furthermore, we evaluate the performance of our GNN-based model on a Twitter sentiment dataset to classify three different sentiments and on Alzheimer's data to differentiate Alzheimer’s disease from healthy individuals respectively. In addition to applying the GNN model to learn a prediction model from the text, we provide post-hoc explanations of the model's decisions for all three tasks using the model's gradients

    A Process for the Restoration of Performances from Musical Errors on Live Progressive Rock Albums

    Get PDF
    In the course of my practice of producing live progressive rock albums, a significant challenge has emerged: how to repair performance errors while retaining the intended expressive performance. Using a practice as research methodology, I develop a novel process, Error Analysis and Performance Restoration (EAPR), to restore a performer’s intention where an error was assessed to have been made. In developing this process, within the context of my practice, I investigate: the nature of live albums and the groups to which I am accountable, a definition of performance errors, an examination of their causes, and the existing literature on these topics. In presenting EAPR, I demonstrate, drawing from existing research, a mechanism by which originally intended performances can be extracted from recorded errors. The EAPR process exists as a conceptual model; each album has a specific implementation to address the needs of that album, and the currently available technology. Restoration techniques are developed as part of this implementation. EAPR is developed and demonstrated through my work restoring performances on a front-line commercial live release, the Creative Submission Album. The specific EAPR implementation I design for it is laid out, and detailed examples of its techniques demonstrated
    • …
    corecore