Speech-based automatic depression detection via biomarkers identification and artificial intelligence approaches

Abstract

Depression has become one of the most prevalent mental health issues, affecting more than 300 million people all over the world. However, due to factors such as limited medical resources and accessibility to health care, there are still a large number of patients undiagnosed. In addition, the traditional approaches to depression diagnosis have limitations because they are usually time-consuming, and depend on clinical experience that varies across different clinicians. From this perspective, the use of automatic depression detection can make the diagnosis process much faster and more accessible. In this thesis, we present the possibility of using speech for automatic depression detection. This is based on the findings in neuroscience that depressed patients have abnormal cognition mechanisms thus leading to the speech differs from that of healthy people. Therefore, in this thesis, we show two ways of benefiting from automatic depression detection, i.e., identifying speech markers of depression and constructing novel deep learning models to improve detection accuracy. The identification of speech markers tries to capture measurable depression traces left in speech. From this perspective, speech markers such as speech duration, pauses and correlation matrices are proposed. Speech duration and pauses take speech fluency into account, while correlation matrices represent the relationship between acoustic features and aim at capturing psychomotor retardation in depressed patients. Experimental results demonstrate that these proposed markers are effective at improving the performance in recognizing depressed speakers. In addition, such markers show statistically significant differences between depressed patients and non-depressed individuals, which explains the possibility of using these markers for depression detection and further confirms that depression leaves detectable traces in speech. In addition to the above, we propose an attention mechanism, Multi-local Attention (MLA), to emphasize depression-relevant information locally. Then we analyse the effectiveness of MLA on performance and efficiency. According to the experimental results, such a model can significantly improve performance and confidence in the detection while reducing the time required for recognition. Furthermore, we propose Cross-Data Multilevel Attention (CDMA) to emphasize different types of depression-relevant information, i.e., specific to each type of speech and common to both, by using multiple attention mechanisms. Experimental results demonstrate that the proposed model is effective to integrate different types of depression-relevant information in speech, improving the performance significantly for depression detection

    Similar works