588 research outputs found
Speech-based automatic depression detection via biomarkers identification and artificial intelligence approaches
Depression has become one of the most prevalent mental health issues, affecting more than 300 million people all over the world. However, due to factors such as limited medical resources and accessibility to health care, there are still a large number of patients undiagnosed. In addition, the traditional approaches to depression diagnosis have limitations because they are usually time-consuming, and depend on clinical experience that varies across different clinicians. From this perspective, the use of automatic depression detection can make the diagnosis process much faster and more accessible. In this thesis, we present the possibility of using speech for automatic depression detection. This is based on the findings in neuroscience that depressed patients have abnormal cognition mechanisms thus leading to the speech differs from that of healthy people.
Therefore, in this thesis, we show two ways of benefiting from automatic depression detection, i.e., identifying speech markers of depression and constructing novel deep learning models to improve detection accuracy.
The identification of speech markers tries to capture measurable depression traces left in speech. From this perspective, speech markers such as speech duration, pauses and correlation matrices are proposed. Speech duration and pauses take speech fluency into account, while correlation matrices represent the relationship between acoustic features and aim at capturing psychomotor retardation in depressed patients. Experimental results demonstrate that these proposed markers are effective at improving the performance in recognizing depressed speakers. In addition, such markers show statistically significant differences between depressed patients and non-depressed individuals, which explains the possibility of using these markers for depression detection and further confirms that depression leaves detectable traces in speech.
In addition to the above, we propose an attention mechanism, Multi-local Attention (MLA), to emphasize depression-relevant information locally. Then we analyse the effectiveness of MLA on performance and efficiency. According to the experimental results, such a model can significantly improve performance and confidence in the detection while reducing the time required for recognition. Furthermore, we propose Cross-Data Multilevel Attention (CDMA) to emphasize different types of depression-relevant information, i.e., specific to each type of speech and common to both, by using multiple attention mechanisms. Experimental results demonstrate that the proposed model is effective to integrate different types of depression-relevant information in speech, improving the performance significantly for depression detection
Spatio-Temporal AU Relational Graph Representation Learning For Facial Action Units Detection
This paper presents our Facial Action Units (AUs) recognition submission to
the fifth Affective Behavior Analysis in-the-wild Competition (ABAW). Our
approach consists of three main modules: (i) a pre-trained facial
representation encoder which produce a strong facial representation from each
input face image in the input sequence; (ii) an AU-specific feature generator
that specifically learns a set of AU features from each facial representation;
and (iii) a spatio-temporal graph learning module that constructs a
spatio-temporal graph representation. This graph representation describes AUs
contained in all frames and predicts the occurrence of each AU based on both
the modeled spatial information within the corresponding face and the learned
temporal dynamics among frames. The experimental results show that our approach
outperformed the baseline and the spatio-temporal graph representation learning
allows our model to generate the best results among all ablated systems. Our
model ranks at the 4th place in the AU recognition track at the 5th ABAW
Competition
Recommended from our members
Policy options for food system transformation in Africa and the role of science, technology and innovation
As recognized by the Science, Technology and Innovation Strategy for Africa – 2024 (STISA-2024), science, technology and innovation (STI) offer many opportunities for addressing the main constraints to embracing transformation in Africa, while important lessons can be learned from successful interventions, including policy and institutional innovations, from those African countries that have already made significant progress towards food system transformation. This chapter identifies opportunities for African countries and the region to take proactive steps to harness the potential of the food and agriculture sector so as to ensure future food and nutrition security by applying STI solutions and by drawing on transformational policy and institutional innovations across the continent. Potential game-changing solutions and innovations for food system transformation serving people and ecology apply to (a) raising production efficiency and restoring and sustainably managing degraded resources; (b) finding innovation in the storage, processing and packaging of foods; (c) improving human nutrition and health; (d) addressing equity and vulnerability at the community and ecosystem levels; and (e) establishing preparedness and accountability systems. To be effective in these areas will require institutional coordination; clear, food safety and health-conscious regulatory environments; greater and timely access to information; and transparent monitoring and accountability systems
Computer audition for emotional wellbeing
This thesis is focused on the application of computer audition (i. e., machine listening) methodologies for monitoring states of emotional wellbeing. Computer audition is a growing field and has been successfully applied to an array of use cases in recent years. There are several advantages to audio-based computational analysis; for example, audio can be recorded non-invasively, stored economically, and can capture rich information on happenings in a given environment, e. g., human behaviour. With this in mind, maintaining emotional wellbeing is a challenge for humans and emotion-altering conditions, including stress and anxiety, have become increasingly common in recent years. Such conditions manifest in the body, inherently changing how we express ourselves. Research shows these alterations are perceivable within vocalisation, suggesting that speech-based audio monitoring may be valuable for developing artificially intelligent systems that target improved wellbeing. Furthermore, computer audition applies machine learning and other computational techniques to audio understanding, and so by combining computer audition with applications in the domain of computational paralinguistics and emotional wellbeing, this research concerns the broader field of empathy for Artificial Intelligence (AI). To this end, speech-based audio modelling that incorporates and understands paralinguistic wellbeing-related states may be a vital cornerstone for improving the degree of empathy that an artificial intelligence has.
To summarise, this thesis investigates the extent to which speech-based computer audition methodologies can be utilised to understand human emotional wellbeing. A fundamental background on the fields in question as they pertain to emotional wellbeing is first presented, followed by an outline of the applied audio-based methodologies. Next, detail is provided for several machine learning experiments focused on emotional wellbeing applications, including analysis and recognition of under-researched phenomena in speech, e. g., anxiety, and markers of stress. Core contributions from this thesis include the collection of several related datasets, hybrid fusion strategies for an emotional gold standard, novel machine learning strategies for data interpretation, and an in-depth acoustic-based computational evaluation of several human states. All of these contributions focus on ascertaining the advantage of audio in the context of modelling emotional wellbeing. Given the sensitive nature of human wellbeing, the ethical implications involved with developing and applying such systems are discussed throughout
Science and Innovations for Food Systems Transformation
This Open Access book compiles the findings of the Scientific Group of the United Nations Food Systems Summit 2021 and its research partners. The Scientific Group was an independent group of 28 food systems scientists from all over the world with a mandate from the Deputy Secretary-General of the United Nations. The chapters provide science- and research-based, state-of-the-art, solution-oriented knowledge and evidence to inform the transformation of contemporary food systems in order to achieve more sustainable, equitable and resilient systems
Measuring the Severity of Depression from Text using Graph Representation Learning
The common practice of psychology in measuring the severity of a patient's depressive symptoms is based on an interactive conversation between a clinician and the patient. In this dissertation, we focus on predicting a score representing the severity of depression from such a text. We first present a generic graph neural network (GNN) to automatically rate severity using patient transcripts. We also test a few sequence-based deep models in the same task. We then propose a novel form for node attributes within a GNN-based model that captures node-specific embedding for every word in the vocabulary. This provides a global representation of each node, coupled with node-level updates according to associations between words in a transcript. Furthermore, we evaluate the performance of our GNN-based model on a Twitter sentiment dataset to classify three different sentiments and on Alzheimer's data to differentiate Alzheimer’s disease from healthy individuals respectively. In addition to applying the GNN model to learn a prediction model from the text, we provide post-hoc explanations of the model's decisions for all three tasks using the model's gradients
Recommended from our members
Sonic heritage: listening to the past
History is so often told through objects, images and photographs, but the potential of sounds to reveal place and space is often neglected. Our research project ‘Sonic Palimpsest’1 explores the potential of sound to evoke impressions and new understandings of the past, to embrace the sonic as a tool to understand what was, in a way that can complement and add to our predominant visual understandings. Our work includes the expansion of the Oral History archives held at Chatham Dockyard to include women’s voices and experiences, and the creation of sonic works to engage the public with their heritage. Our research highlights the social and cultural value of oral history and field recordings in the transmission of knowledge to both researchers and the public. Together these recordings document how buildings and spaces within the dockyard were used and experienced by those who worked there. We can begin to understand the social and cultural roles of these buildings within the community, both past and present
Recommended from our members
Healthy Diet: A Definition for the United Nations Food Systems Summit 2021
A Process for the Restoration of Performances from Musical Errors on Live Progressive Rock Albums
In the course of my practice of producing live progressive rock albums, a significant
challenge has emerged: how to repair performance errors while retaining the intended
expressive performance. Using a practice as research methodology, I develop a novel process,
Error Analysis and Performance Restoration (EAPR), to restore a performer’s intention where
an error was assessed to have been made. In developing this process, within the context of
my practice, I investigate: the nature of live albums and the groups to which I am
accountable, a definition of performance errors, an examination of their causes, and the
existing literature on these topics. In presenting EAPR, I demonstrate, drawing from existing
research, a mechanism by which originally intended performances can be extracted from
recorded errors. The EAPR process exists as a conceptual model; each album has a specific
implementation to address the needs of that album, and the currently available technology.
Restoration techniques are developed as part of this implementation. EAPR is developed and
demonstrated through my work restoring performances on a front-line commercial live
release, the Creative Submission Album. The specific EAPR implementation I design for it is
laid out, and detailed examples of its techniques demonstrated
- …