689 research outputs found

    Objective methods for reliable detection of concealed depression

    Get PDF
    Recent research has shown that it is possible to automatically detect clinical depression from audio-visual recordings. Before considering integration in a clinical pathway, a key question that must be asked is whether such systems can be easily fooled. This work explores the potential of acoustic features to detect clinical depression in adults both when acting normally and when asked to conceal their depression. Nine adults diagnosed with mild to moderate depression as per the Beck Depression Inventory (BDI-II) and Patient Health Questionnaire (PHQ-9) were asked a series of questions and to read a excerpt from a novel aloud under two different experimental conditions. In one, participants were asked to act naturally and in the other, to suppress anything that they felt would be indicative of their depression. Acoustic features were then extracted from this data and analysed using paired t-tests to determine any statistically significant differences between healthy and depressed participants. Most features that were found to be significantly different during normal behaviour remained so during concealed behaviour. In leave-one-subject-out automatic classification studies of the 9 depressed subjects and 8 matched healthy controls, an 88% classification accuracy and 89% sensitivity was achieved. Results remained relatively robust during concealed behaviour, with classifiers trained on only non-concealed data achieving 81% detection accuracy and 75% sensitivity when tested on concealed data. These results indicate there is good potential to build deception-proof automatic depression monitoring systems

    Objective methods for reliable detection of concealed depression

    Get PDF
    Recent research has shown that it is possible to automatically detect clinical depression from audio-visual recordings. Before considering integration in a clinical pathway, a key question that must be asked is whether such systems can be easily fooled. This work explores the potential of acoustic features to detect clinical depression in adults both when acting normally and when asked to conceal their depression. Nine adults diagnosed with mild to moderate depression as per the Beck Depression Inventory (BDI-II) and Patient Health Questionnaire (PHQ-9) were asked a series of questions and to read a excerpt from a novel aloud under two different experimental conditions. In one, participants were asked to act naturally and in the other, to suppress anything that they felt would be indicative of their depression. Acoustic features were then extracted from this data and analysed using paired t-tests to determine any statistically significant differences between healthy and depressed participants. Most features that were found to be significantly different during normal behaviour remained so during concealed behaviour. In leave-one-subject-out automatic classification studies of the 9 depressed subjects and 8 matched healthy controls, an 88% classification accuracy and 89% sensitivity was achieved. Results remained relatively robust during concealed behaviour, with classifiers trained on only non-concealed data achieving 81% detection accuracy and 75% sensitivity when tested on concealed data. These results indicate there is good potential to build deception-proof automatic depression monitoring systems

    ์ฃผ์š” ์šฐ์šธ ์žฅ์• ์˜ ์Œ์„ฑ ๊ธฐ๋ฐ˜ ๋ถ„์„: ์—ฐ์†์ ์ธ ๋ฐœํ™”์˜ ์Œํ–ฅ์  ๋ณ€ํ™”๋ฅผ ์ค‘์‹ฌ์œผ๋กœ

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(๋ฐ•์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต๋Œ€ํ•™์› : ์œตํ•ฉ๊ณผํ•™๊ธฐ์ˆ ๋Œ€ํ•™์› ์œตํ•ฉ๊ณผํ•™๋ถ€(๋””์ง€ํ„ธ์ •๋ณด์œตํ•ฉ์ „๊ณต), 2023. 2. ์ด๊ต๊ตฌ.Major depressive disorder (commonly referred to as depression) is a common disorder that affects 3.8% of the world's population. Depression stems from various causes, such as genetics, aging, social factors, and abnormalities in the neurotransmitter system; thus, early detection and monitoring are essential. The human voice is considered a representative biomarker for observing depression; accordingly, several studies have developed an automatic depression diagnosis system based on speech. However, constructing a speech corpus is a challenge, studies focus on adults under 60 years of age, and there are insufficient medical hypotheses based on the clinical findings of psychiatrists, limiting the evolution of the medical diagnostic tool. Moreover, the effect of taking antipsychotic drugs on speech characteristics during the treatment phase is overlooked. Thus, this thesis studies a speech-based automatic depression diagnosis system at the semantic level (sentence). First, to analyze depression among the elderly whose emotional changes do not adequately reflect speech characteristics, it developed the mood-induced sentence to build the elderly depression speech corpus and designed an automatic depression diagnosis system for the elderly. Second, it constructed an extrapyramidal symptom speech corpus to investigate the extrapyramidal symptoms, a typical side effect that can appear from an antipsychotic drug overdose. Accordingly, there is a strong correlation between the antipsychotic dose and speech characteristics. The study paved the way for a comprehensive examination of the automatic diagnosis system for depression.์ฃผ์š” ์šฐ์šธ ์žฅ์•  ์ฆ‰ ํ”ํžˆ ์šฐ์šธ์ฆ์ด๋ผ๊ณ  ์ผ์ปฌ์–ด์ง€๋Š” ๊ธฐ๋ถ„ ์žฅ์• ๋Š” ์ „ ์„ธ๊ณ„์ธ ์ค‘ 3.8%์— ๋‹ฌํ•˜๋Š” ์‚ฌ๋žŒ๋“ค์ด ๊ฒช์€๋ฐ” ์žˆ๋Š” ๋งค์šฐ ํ”ํ•œ ์งˆ๋ณ‘์ด๋‹ค. ์œ ์ „, ๋…ธํ™”, ์‚ฌํšŒ์  ์š”์ธ, ์‹ ๊ฒฝ์ „๋‹ฌ๋ฌผ์งˆ ์ฒด๊ณ„์˜ ์ด์ƒ๋“ฑ ๋‹ค์–‘ํ•œ ์›์ธ์œผ๋กœ ๋ฐœ์ƒํ•˜๋Š” ์šฐ์šธ์ฆ์€ ์กฐ๊ธฐ ๋ฐœ๊ฒฌ ๋ฐ ์ผ์ƒ ์ƒํ™œ์—์„œ์˜ ๊ด€๋ฆฌ๊ฐ€ ๋งค์šฐ ์ค‘์š”ํ•˜๋‹ค๊ณ  ํ•  ์ˆ˜ ์žˆ๋‹ค. ์ธ๊ฐ„์˜ ์Œ์„ฑ์€ ์šฐ์šธ์ฆ์„ ๊ด€์ฐฐํ•˜๊ธฐ์— ๋Œ€ํ‘œ์ ์ธ ๋ฐ”์ด์˜ค๋งˆ์ปค๋กœ ์—ฌ๊ฒจ์ ธ ์™”์œผ๋ฉฐ, ์Œ์„ฑ ๋ฐ์ดํ„ฐ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœํ•œ ์ž๋™ ์šฐ์šธ์ฆ ์ง„๋‹จ ์‹œ์Šคํ…œ ๊ฐœ๋ฐœ์„ ์œ„ํ•œ ์—ฌ๋Ÿฌ ์—ฐ๊ตฌ๋“ค์ด ์ง„ํ–‰๋˜์–ด ์™”๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์Œ์„ฑ ๋ง๋ญ‰์น˜ ๊ตฌ์ถ•์˜ ์–ด๋ ค์›€๊ณผ 60์„ธ ์ดํ•˜์˜ ์„ฑ์ธ๋“ค์—๊ฒŒ ์ดˆ์ ์ด ๋งž์ถ”์–ด์ง„ ์—ฐ๊ตฌ, ์ •์‹ ๊ณผ ์˜์‚ฌ๋“ค์˜ ์ž„์ƒ ์†Œ๊ฒฌ์„ ๋ฐ”ํƒ•์œผ๋กœํ•œ ์˜ํ•™์  ๊ฐ€์„ค ์„ค์ •์˜ ๋ฏธํก๋“ฑ์˜ ํ•œ๊ณ„์ ์„ ๊ฐ€์ง€๊ณ  ์žˆ์œผ๋ฉฐ, ์ด๋Š” ์˜๋ฃŒ ์ง„๋‹จ ๊ธฐ๊ตฌ๋กœ ๋ฐœ์ „ํ•˜๋Š”๋ฐ ํ•œ๊ณ„์ ์ด๋ผ๊ณ  ํ•  ์ˆ˜ ์žˆ๋‹ค. ๋˜ํ•œ, ํ•ญ์ •์‹ ์„ฑ ์•ฝ๋ฌผ์˜ ๋ณต์šฉ์ด ์Œ์„ฑ ํŠน์ง•์— ๋ฏธ์น  ์ˆ˜ ์žˆ๋Š” ์˜ํ–ฅ ๋˜ํ•œ ๊ฐ„๊ณผ๋˜๊ณ  ์žˆ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ์œ„์˜ ํ•œ๊ณ„์ ๋“ค์„ ๋ณด์™„ํ•˜๊ธฐ ์œ„ํ•œ ์˜๋ฏธ๋ก ์  ์ˆ˜์ค€ (๋ฌธ์žฅ ๋‹จ์œ„)์—์„œ์˜ ์Œ์„ฑ ๊ธฐ๋ฐ˜ ์ž๋™ ์šฐ์šธ์ฆ ์ง„๋‹จ์— ๋Œ€ํ•œ ์—ฐ๊ตฌ๋ฅผ ์‹œํ–‰ํ•˜๊ณ ์ž ํ•œ๋‹ค. ์šฐ์„ ์ ์œผ๋กœ ๊ฐ์ •์˜ ๋ณ€ํ™”๊ฐ€ ์Œ์„ฑ ํŠน์ง•์„ ์ž˜ ๋ฐ˜์˜๋˜์ง€ ์•Š๋Š” ๋…ธ์ธ์ธต์˜ ์šฐ์šธ์ฆ ๋ถ„์„์„ ์œ„ํ•ด ๊ฐ์ • ๋ฐœํ™” ๋ฌธ์žฅ์„ ๊ฐœ๋ฐœํ•˜์—ฌ ๋…ธ์ธ ์šฐ์šธ์ฆ ์Œ์„ฑ ๋ง๋ญ‰์น˜๋ฅผ ๊ตฌ์ถ•ํ•˜๊ณ , ๋ฌธ์žฅ ๋‹จ์œ„์—์„œ์˜ ๊ด€์ฐฐ์„ ํ†ตํ•ด ๋…ธ์ธ ์šฐ์šธ์ฆ ๊ตฐ์—์„œ ๊ฐ์ • ๋ฌธ์žฅ ๋ฐœํ™”๊ฐ€ ๋ฏธ์น˜๋Š” ์˜ํ–ฅ๊ณผ ๊ฐ์ • ์ „์ด๋ฅผ ํ™•์ธํ•  ์ˆ˜ ์žˆ์—ˆ์œผ๋ฉฐ, ๋…ธ์ธ์ธต์˜ ์ž๋™ ์šฐ์šธ์ฆ ์ง„๋‹จ ์‹œ์Šคํ…œ์„ ์„ค๊ณ„ํ•˜์˜€๋‹ค. ์ตœ์ข…์ ์œผ๋กœ ํ•ญ์ •์‹ ๋ณ‘ ์•ฝ๋ฌผ์˜ ๊ณผ๋ณต์šฉ์œผ๋กœ ๋‚˜ํƒ€๋‚  ์ˆ˜ ์žˆ๋Š” ๋Œ€ํ‘œ์ ์ธ ๋ถ€์ž‘์šฉ์ธ ์ถ”์ฒด์™ธ๋กœ ์ฆ์ƒ์„ ์กฐ์‚ฌํ•˜๊ธฐ ์œ„ํ•ด ์ถ”์ฒด์™ธ๋กœ ์ฆ์ƒ ์Œ์„ฑ ๋ง๋ญ‰์น˜๋ฅผ ๊ตฌ์ถ•ํ•˜์˜€๊ณ , ํ•ญ์ •์‹ ๋ณ‘ ์•ฝ๋ฌผ์˜ ๋ณต์šฉ๋Ÿ‰๊ณผ ์Œ์„ฑ ํŠน์ง•๊ฐ„์˜ ์ƒ๊ด€๊ด€๊ณ„๋ฅผ ๋ถ„์„ํ•˜์—ฌ ์šฐ์šธ์ฆ์˜ ์น˜๋ฃŒ ๊ณผ์ •์—์„œ ํ•ญ์ •์‹ ๋ณ‘ ์•ฝ๋ฌผ์ด ์Œ์„ฑ์— ๋ฏธ์น  ์ˆ˜ ์žˆ๋Š” ์˜ํ–ฅ์— ๋Œ€ํ•ด์„œ ์กฐ์‚ฌํ•˜์˜€๋‹ค. ์ด๋ฅผ ํ†ตํ•ด ์ฃผ์š” ์šฐ์šธ ์žฅ์• ์˜ ์˜์—ญ์— ๋Œ€ํ•œ ํฌ๊ด„์ ์ธ ์—ฐ๊ตฌ๋ฅผ ์ง„ํ–‰ํ•˜์˜€๋‹ค.Chapter 1 Introduction 1 1.1 Research Motivations 3 1.1.1 Bridging the Gap Between Clinical View and Engineering 3 1.1.2 Limitations of Conventional Depressed Speech Corpora 4 1.1.3 Lack of Studies on Depression Among the Elderly 4 1.1.4 Depression Analysis on Semantic Level 6 1.1.5 How Antipsychotic Drug Affects the Human Voice? 7 1.2 Thesis objectives 9 1.3 Outline of the thesis 10 Chapter 2 Theoretical Background 13 2.1 Clinical View of Major Depressive Disorder 13 2.1.1 Types of Depression 14 2.1.2 Major Causes of Depression 15 2.1.3 Symptoms of Depression 17 2.1.4 Diagnosis of Depression 17 2.2 Objective Diagnostic Markers of Depression 19 2.3 Speech in Mental Disorder 19 2.4 Speech Production and Depression 21 2.5 Automatic Depression Diagnostic System 23 2.5.1 Acoustic Feature Representation 24 2.5.2 Classification / Prediction 27 Chapter 3 Developing Sentences for New Depressed Speech Corpus 31 3.1 Introduction 31 3.2 Building Depressed Speech Corpus 32 3.2.1 Elements of Speech Corpus Production 32 3.2.2 Conventional Depressed Speech Corpora 35 3.2.3 Factors Affecting Depressed Speech Characteristics 39 3.3 Motivations 40 3.3.1 Limitations of Conventional Depressed Speech Corpora 40 3.3.2 Attitude of Subjects to Depression: Masked Depression 43 3.3.3 Emotions in Reading 45 3.3.4 Objectives of this Chapter 45 3.4 Proposed Methods 46 3.4.1 Selection of Words 46 3.4.2 Structure of Sentence 47 3.5 Results 49 3.5.1 Mood-Inducing Sentences (MIS) 49 3.5.2 Neutral Sentences for Extrapyramidal Symptom Analysis 49 3.6 Summary 51 Chapter 4 Screening Depression in The Elderly 52 4.1 Introduction 52 4.2 Korean Elderly Depressive Speech Corpus 55 4.2.1 Participants 55 4.2.2 Recording Procedure 57 4.2.3 Recording Specification 58 4.3 Proposed Methods 59 4.3.1 Voice-based Screening Algorithm for Depression 59 4.3.2 Extraction of Acoustic Features 59 4.3.3 Feature Selection System and Distance Computation 62 4.3.4 Classification and Statistical Analyses 63 4.4 Results 65 4.5 Discussion 69 4.6 Summary 74 Chapter 5 Correlation Analysis of Antipsychotic Dose and Speech Characteristics 75 5.1 Introduction 75 5.2 Korean Extrapyramidal Symptoms Speech Corpus 78 5.2.1 Participants 78 5.2.2 Recording Process 79 5.2.3 Extrapyramidal Symptoms Annotation and Equivalent Dose Calculations 80 5.3 Proposed Methods 81 5.3.1 Acoustic Feature Extraction 81 5.3.2 Speech Characteristics Analysis recording to Eq.dose 83 5.4 Results 83 5.5 Discussion 87 5.6 Summary 90 Chapter 6 Conclusions and Future Work 91 6.1 Conclusions 91 6.2 Future work 95 Bibliography 97 ์ดˆ ๋ก 121๋ฐ•

    Intelligent Advanced User Interfaces for Monitoring Mental Health Wellbeing

    Get PDF
    It has become pressing to develop objective and automatic measurements integrated in intelligent diagnostic tools for detecting and monitoring depressive states and enabling an increased precision of diagnoses and clinical decision-makings. The challenge is to exploit behavioral and physiological biomarkers and develop Artificial Intelligent (AI) models able to extract information from a complex combination of signals considered key symptoms. The proposed AI models should be able to help clinicians to rapidly formulate accurate diagnoses and suggest personalized intervention plans ranging from coaching activities (exploiting for example serious games), support networks (via chats, or social networks), and alerts to caregivers, doctors, and care control centers, reducing the considerable burden on national health care institutions in terms of medical, and social costs associated to depression cares

    Automatic Detection of Depression in Speech Using Ensemble Convolutional Neural Networks

    Get PDF
    This paper proposes a speech-based method for automatic depression classification. The system is based on ensemble learning for Convolutional Neural Networks (CNNs) and is evaluated using the data and the experimental protocol provided in the Depression Classification Sub-Challenge (DCC) at the 2016 Audioโ€“Visual Emotion Challenge (AVEC-2016). In the pre-processing phase, speech files are represented as a sequence of log-spectrograms and randomly sampled to balance positive and negative samples. For the classification task itself, first, a more suitable architecture for this task, based on One-Dimensional Convolutional Neural Networks, is built. Secondly, several of these CNN-based models are trained with different initializations and then the corresponding individual predictions are fused by using an Ensemble Averaging algorithm and combined per speaker to get an appropriate final decision. The proposed ensemble system achieves satisfactory results on the DCC at the AVEC-2016 in comparison with a reference system based on Support Vector Machines and hand-crafted features, with a CNN+LSTM-based system called DepAudionet, and with the case of a single CNN-based classifier.This research was partly funded by Spanish Government grant TEC2017-84395-P

    Detection of clinical depression in adolescents' using acoustic speech analysis

    Get PDF
    Clinical depression is a major risk factor in suicides and is associated with high mortality rates, therefore making it one of the leading causes of death worldwide every year. Symptoms of depression often first appear during adolescence at a time when the voice is changing, in both males and females, suggesting that specific studies of these phenomena in adolescent populations are warranted. The properties of acoustic speech have previously been investigated as possible cues for depression in adults. However, these studies were restricted to small populations of patients and the speech recordings were made during patient’s clinical interviews or fixed-text reading sessions. A collaborative effort with the Oregon research institute (ORI), USA allowed the development of a new speech corpus consisting of a large sample size of 139 adolescents (46 males and 93 females) that were divided into two groups (68 clinically depressed and 71 controls). The speech recordings were made during naturalistic interactions between adolescents and parents. Instead of covering a plethora of acoustic features in the investigation, this study takes the knowledge based from speech science and groups the acoustic features into five categories that relate to the physiological and perceptual areas of the speech production mechanism. These five acoustic feature categories consisted of the prosodic, cepstral, spectral, glottal and Teager energy operator (TEO) based features. The effectiveness in applying these acoustic feature categories in detecting adolescent’s depression was measured. The salient feature categories were determined by testing the feature categories and their combinations within a binary classification framework. In consistency with previous studies, it was observed that: - there are strong gender related differences in classification accuracy; - the glottal features provide an important enhancement of the classification accuracy when combined with other types of features; An important new contribution provided by this thesis was to observe that the TEO based features significantly outperformed prosodic, cepstral, spectral, glottal features and their combinations. An investigation into the possible reasons of such strong performance of the TEO features pointed into the importance of nonlinear mechanisms associated with the glottal flow formation as possible cues for depression

    Analysing Changes in the Acoustic Features of the Human Voice to Detect Depression amongst Biological Females in Higher Education

    Get PDF
    Depression significantly affects a large percentage of the population, with young adult females being one of the most at-risk demographics. Concurrently, there is a growing demand on healthcare, and with sufficient resources often unavailable to diagnose depression, new diagnostic methods are needed that are both cost-effective and accurate. The presence of depression is seen to significantly affect certain acoustic features of the human voice. Acoustic features have been found to exhibit subtle changes beyond the perception of the human auditory system when an individual has depression. With advances in speech processing, these subtle changes can be observed by machines. By measuring these changes, the human voice can be analysed to identify acoustic features that show a correlation with depression. The implementation of voice diagnosis would both reduce the burden on healthcare and ensure those with depression are diagnosed in a timely fashion, allowing them quicker access to treatment. The research project presents an analysis of voice data from 17 biological females between the ages of 20-26 years old in higher education as a means to detect depression. Eight participants were considered healthy with no history of depression, whilst the other nine currently had depression. Participants performed two vocal tasks consisting of extending sounds for a period of time and reading back a passage of speech. Six acoustic features were then measured from the voice data to determine whether these features can be utilised as diagnostic indicators of depression. The main finding of this study demonstrated one of the acoustic features measured demonstrates significant differences when comparing depressed and healthy individuals.<br/

    Stress and emotion recognition in natural speech in the work and family environments

    Get PDF
    The speech stress and emotion recognition and classification technology has a potential to provide significant benefits to the national and international industry and society in general. The accuracy of an automatic emotion speech and emotion recognition relays heavily on the discrimination power of the characteristic features. This work introduced and examined a number of new linear and nonlinear feature extraction methods for an automatic detection of stress and emotion in speech. The proposed linear feature extraction methods included features derived from the speech spectrograms (SS-CB/BARK/ERB-AE, SS-AF-CB/BARK/ERB-AE, SS-LGF-OFS, SS-ALGF-OFS, SS-SP-ALGF-OFS and SS-sigma-pi), wavelet packets (WP-ALGF-OFS) and the empirical mode decomposition (EMD-AER). The proposed nonlinear feature extraction methods were based on the results of recent laryngological studies and nonlinear modelling of the phonation process. The proposed nonlinear features included the area under the TEO autocorrelation envelope based on different spectral decompositions (TEO-DWT, TEO-WP, TEO-PWP-S and TEO-PWP-G), as well as features representing spectral energy distribution of speech (AUSEES) and glottal waveform (AUSEEG). The proposed features were compared with features based on the classical linear model of speech production including F0, formants, MFCC and glottal time/frequency parameters. Two classifiers GMM and KNN were tested for consistency. The experiments used speech under actual stress from the SUSAS database (7 speakers; 3 female and 4 male) and speech with five naturally expressed emotions (neutral, anger, anxious, dysphoric and happy) from the ORI corpora (71 speakers; 27 female and 44 male). The nonlinear features clearly outperformed all the linear features. The classification results demonstrated consistency with the nonlinear model of the phonation process indicating that the harmonic structure and the spectral distribution of the glottal energy provide the most important cues for stress and emotion recognition in speech. The study also investigated if the automatic emotion recognition can determine differences in emotion expression between parents of depressed adolescents and parents of non-depressed adolescents. It was also investigated if there are differences in emotion expression between mothers and fathers in general. The experiment results indicated that parents of depressed adolescent produce stronger more exaggerated expressions of affect than parents of non-depressed children. And females in general provide easier to discriminate (more exaggerated) expressions of affect than males

    Detection of Verbal and Nonverbal speech features as markers of Depression: results of manual analysis and automatic classification

    Get PDF
    The present PhD project was the result of a multidisciplinary work involving psychiatrists, computing scientists, social signal processing experts and psychology students with the aim to analyse verbal and nonverbal behaviour in patients affected by Depression. Collaborations with several Clinical Health Centers were established for the collection of a group of patients suffering from depressive disorders. Moreover, a group of healthy controls was collected as well. A collaboration with the School of Computing Science of Glasgow University was established with the aim to analysed the collected data. Depression was selected for this study because is one of the most common mental disorder in the world (World Health Organization, 2017) associated with half of all suicides (Lecrubier, 2000). It requires prolonged and expensive medical treatments resulting into a significant burden for both patients and society (Olesen et al., 2012). The use of objective and reliable measurements of depressive symptoms can support the clinicians during the diagnosis reducing the risk of subjective biases and disorder misclassification (see discussion in Chapter 1) and doing the diagnosis in a quick and non-invasive way. Given this, the present PhD project proposes the investigation of verbal (i.e. speech content) and nonverbal (i.e. paralingiuistic features) behaviour in depressed patients to find several speech parameters that can be objective markers of depressive symptoms. The verbal and nonverbal behaviour are investigated through two kind of speech tasks: reading and spontaneous speech. Both manual features extraction and automatic classification approaches are used for this purpose. Differences between acute and remitted patients for prosodic and verbal features have been investigated as well. In addition, unlike other literature studies, in this project differences between subjects with and without Early Maladaptive Schema (EMS: Young et al., 2003) independently from the depressive symptoms, have been investigated with respect to both verbal and nonverbal behaviour. The proposed analysis shows that patients differ from healthy subjects for several verbal and nonverbal features. Moreover, using both reading and spontaneous speech, it is possible to automatically detect Depression with a good accuracy level (from 68 to 76%). These results demonstrate that the investigation of speech features can be a useful instrument, in addition to the current self-reports and clinical interviews, for helping the diagnosis of depressive disorders. Contrary to what was expected, patients in acute and remitted phase do not report differences regarding the nonverbal features and only few differences emerges for the verbal behaviour. At the same way, the automatic classification using paralinguistic features does not work well for the discrimination of subjects with and without EMS and only few differences between them have been found for the verbal behaviour. Possible explanations and limitations of these results will be discussed

    Speech-based automatic depression detection via biomarkers identification and artificial intelligence approaches

    Get PDF
    Depression has become one of the most prevalent mental health issues, affecting more than 300 million people all over the world. However, due to factors such as limited medical resources and accessibility to health care, there are still a large number of patients undiagnosed. In addition, the traditional approaches to depression diagnosis have limitations because they are usually time-consuming, and depend on clinical experience that varies across different clinicians. From this perspective, the use of automatic depression detection can make the diagnosis process much faster and more accessible. In this thesis, we present the possibility of using speech for automatic depression detection. This is based on the findings in neuroscience that depressed patients have abnormal cognition mechanisms thus leading to the speech differs from that of healthy people. Therefore, in this thesis, we show two ways of benefiting from automatic depression detection, i.e., identifying speech markers of depression and constructing novel deep learning models to improve detection accuracy. The identification of speech markers tries to capture measurable depression traces left in speech. From this perspective, speech markers such as speech duration, pauses and correlation matrices are proposed. Speech duration and pauses take speech fluency into account, while correlation matrices represent the relationship between acoustic features and aim at capturing psychomotor retardation in depressed patients. Experimental results demonstrate that these proposed markers are effective at improving the performance in recognizing depressed speakers. In addition, such markers show statistically significant differences between depressed patients and non-depressed individuals, which explains the possibility of using these markers for depression detection and further confirms that depression leaves detectable traces in speech. In addition to the above, we propose an attention mechanism, Multi-local Attention (MLA), to emphasize depression-relevant information locally. Then we analyse the effectiveness of MLA on performance and efficiency. According to the experimental results, such a model can significantly improve performance and confidence in the detection while reducing the time required for recognition. Furthermore, we propose Cross-Data Multilevel Attention (CDMA) to emphasize different types of depression-relevant information, i.e., specific to each type of speech and common to both, by using multiple attention mechanisms. Experimental results demonstrate that the proposed model is effective to integrate different types of depression-relevant information in speech, improving the performance significantly for depression detection
    • โ€ฆ
    corecore