189 research outputs found

    Investigating temporal and prosodic markers in clinical high-risk for psychosis participants using automated acoustic analysis

    Get PDF
    Introduction: Research into language abnormalities has gained attention given the role of language impairments as a plausible marker for early detection and diagnosis of psychosis. Semantic and syntactic aberrations have been widely observed in schizophrenia across illness stages. Recently, acoustic abnormalities such as temporal and prosodic features of speech have been observed in schizophrenia patients. Yet, mixed evidence exists on the presence of acoustic deficits in participants meeting clinical high-risk for psychosis (CHR-P) criteria. The present study aimed to clarify whether acoustic impairments could be used to identify CHR-P individuals when compared to participants with substance use and affective disorders (clinical high-risk negative; (CHR-N) and to healthy controls (HC) participants. Crucially, methodological issues were addressed including the duration of speech samples to determine their impact on the acoustic results. Methods: Data were available from the Youth mental health, risk and Resilience (YouR) study. Speech samples were recorded from the semi structured clinical interviews of the Comprehensive Assessment of At Risk Mental States (CAARMS) in 50 CHR-P participants who were compared against a group of 17 HC and 23 CHR-N participants. Temporal and prosodic features were extracted from the recordings. Linear regression was used to determine the influence of interview duration on the acoustic estimates. After examining group differences for each of the acoustic features, temporal and prosodic indices were used to determine whether they could be used determine group status using binary logistic regressions. Results: No deficits were observed in temporal or prosodic variables in the CHR-P group when compared to HCs. Instead, CHR-N individuals were characterized by slower speech rate, more and longer pauses and higher unvoiced frames percentage compared to CHR-P participants. Temporal features could better discriminate between groups compared to prosodic features, with models explaining up to 47% of the variance between CHR-Ns and HCs and up to 28% of variance between CHR-Ps and CHR-Ns. Yet, none of these models survived bootstrapping. Moreover, group differences for temporal and prosodic features were largely robust to the interview duration effects. Finally, no significant relationship was obtained for temporal and prosodic features with clinical and functional symptom severity. Discussion: These finding suggests that temporal and prosodic features of speech are not impaired in early-stage psychosis. The acoustic features examined indicated the presence of acoustic impairments in CHR-N participants, which resulted spurious following bootstrapping and therefore hinted to the importance of employing validation methods on acoustic signatures in psychosis. This is crucial given the small sample sizes across the literature and heterogeneity of the clinical groups. Given the absence of acoustic disturbances of speech in CHR-P individuals observed in the present research, sematic and syntactic abnormalities may constitute a more promising biomarker of early psychosis. Further studies are required to clarify whether acoustic abnormalities are present in sub-groups of CHR-P participants with elevated psychosis-risk

    주요 우울 장애의 음성 기반 분석: 연속적인 발화의 음향적 변화를 중심으로

    Get PDF
    학위논문(박사) -- 서울대학교대학원 : 융합과학기술대학원 융합과학부(디지털정보융합전공), 2023. 2. 이교구.Major depressive disorder (commonly referred to as depression) is a common disorder that affects 3.8% of the world's population. Depression stems from various causes, such as genetics, aging, social factors, and abnormalities in the neurotransmitter system; thus, early detection and monitoring are essential. The human voice is considered a representative biomarker for observing depression; accordingly, several studies have developed an automatic depression diagnosis system based on speech. However, constructing a speech corpus is a challenge, studies focus on adults under 60 years of age, and there are insufficient medical hypotheses based on the clinical findings of psychiatrists, limiting the evolution of the medical diagnostic tool. Moreover, the effect of taking antipsychotic drugs on speech characteristics during the treatment phase is overlooked. Thus, this thesis studies a speech-based automatic depression diagnosis system at the semantic level (sentence). First, to analyze depression among the elderly whose emotional changes do not adequately reflect speech characteristics, it developed the mood-induced sentence to build the elderly depression speech corpus and designed an automatic depression diagnosis system for the elderly. Second, it constructed an extrapyramidal symptom speech corpus to investigate the extrapyramidal symptoms, a typical side effect that can appear from an antipsychotic drug overdose. Accordingly, there is a strong correlation between the antipsychotic dose and speech characteristics. The study paved the way for a comprehensive examination of the automatic diagnosis system for depression.주요 우울 장애 즉 흔히 우울증이라고 일컬어지는 기분 장애는 전 세계인 중 3.8%에 달하는 사람들이 겪은바 있는 매우 흔한 질병이다. 유전, 노화, 사회적 요인, 신경전달물질 체계의 이상등 다양한 원인으로 발생하는 우울증은 조기 발견 및 일상 생활에서의 관리가 매우 중요하다고 할 수 있다. 인간의 음성은 우울증을 관찰하기에 대표적인 바이오마커로 여겨져 왔으며, 음성 데이터를 기반으로한 자동 우울증 진단 시스템 개발을 위한 여러 연구들이 진행되어 왔다. 그러나 음성 말뭉치 구축의 어려움과 60세 이하의 성인들에게 초점이 맞추어진 연구, 정신과 의사들의 임상 소견을 바탕으로한 의학적 가설 설정의 미흡등의 한계점을 가지고 있으며, 이는 의료 진단 기구로 발전하는데 한계점이라고 할 수 있다. 또한, 항정신성 약물의 복용이 음성 특징에 미칠 수 있는 영향 또한 간과되고 있다. 본 논문에서는 위의 한계점들을 보완하기 위한 의미론적 수준 (문장 단위)에서의 음성 기반 자동 우울증 진단에 대한 연구를 시행하고자 한다. 우선적으로 감정의 변화가 음성 특징을 잘 반영되지 않는 노인층의 우울증 분석을 위해 감정 발화 문장을 개발하여 노인 우울증 음성 말뭉치를 구축하고, 문장 단위에서의 관찰을 통해 노인 우울증 군에서 감정 문장 발화가 미치는 영향과 감정 전이를 확인할 수 있었으며, 노인층의 자동 우울증 진단 시스템을 설계하였다. 최종적으로 항정신병 약물의 과복용으로 나타날 수 있는 대표적인 부작용인 추체외로 증상을 조사하기 위해 추체외로 증상 음성 말뭉치를 구축하였고, 항정신병 약물의 복용량과 음성 특징간의 상관관계를 분석하여 우울증의 치료 과정에서 항정신병 약물이 음성에 미칠 수 있는 영향에 대해서 조사하였다. 이를 통해 주요 우울 장애의 영역에 대한 포괄적인 연구를 진행하였다.Chapter 1 Introduction 1 1.1 Research Motivations 3 1.1.1 Bridging the Gap Between Clinical View and Engineering 3 1.1.2 Limitations of Conventional Depressed Speech Corpora 4 1.1.3 Lack of Studies on Depression Among the Elderly 4 1.1.4 Depression Analysis on Semantic Level 6 1.1.5 How Antipsychotic Drug Affects the Human Voice? 7 1.2 Thesis objectives 9 1.3 Outline of the thesis 10 Chapter 2 Theoretical Background 13 2.1 Clinical View of Major Depressive Disorder 13 2.1.1 Types of Depression 14 2.1.2 Major Causes of Depression 15 2.1.3 Symptoms of Depression 17 2.1.4 Diagnosis of Depression 17 2.2 Objective Diagnostic Markers of Depression 19 2.3 Speech in Mental Disorder 19 2.4 Speech Production and Depression 21 2.5 Automatic Depression Diagnostic System 23 2.5.1 Acoustic Feature Representation 24 2.5.2 Classification / Prediction 27 Chapter 3 Developing Sentences for New Depressed Speech Corpus 31 3.1 Introduction 31 3.2 Building Depressed Speech Corpus 32 3.2.1 Elements of Speech Corpus Production 32 3.2.2 Conventional Depressed Speech Corpora 35 3.2.3 Factors Affecting Depressed Speech Characteristics 39 3.3 Motivations 40 3.3.1 Limitations of Conventional Depressed Speech Corpora 40 3.3.2 Attitude of Subjects to Depression: Masked Depression 43 3.3.3 Emotions in Reading 45 3.3.4 Objectives of this Chapter 45 3.4 Proposed Methods 46 3.4.1 Selection of Words 46 3.4.2 Structure of Sentence 47 3.5 Results 49 3.5.1 Mood-Inducing Sentences (MIS) 49 3.5.2 Neutral Sentences for Extrapyramidal Symptom Analysis 49 3.6 Summary 51 Chapter 4 Screening Depression in The Elderly 52 4.1 Introduction 52 4.2 Korean Elderly Depressive Speech Corpus 55 4.2.1 Participants 55 4.2.2 Recording Procedure 57 4.2.3 Recording Specification 58 4.3 Proposed Methods 59 4.3.1 Voice-based Screening Algorithm for Depression 59 4.3.2 Extraction of Acoustic Features 59 4.3.3 Feature Selection System and Distance Computation 62 4.3.4 Classification and Statistical Analyses 63 4.4 Results 65 4.5 Discussion 69 4.6 Summary 74 Chapter 5 Correlation Analysis of Antipsychotic Dose and Speech Characteristics 75 5.1 Introduction 75 5.2 Korean Extrapyramidal Symptoms Speech Corpus 78 5.2.1 Participants 78 5.2.2 Recording Process 79 5.2.3 Extrapyramidal Symptoms Annotation and Equivalent Dose Calculations 80 5.3 Proposed Methods 81 5.3.1 Acoustic Feature Extraction 81 5.3.2 Speech Characteristics Analysis recording to Eq.dose 83 5.4 Results 83 5.5 Discussion 87 5.6 Summary 90 Chapter 6 Conclusions and Future Work 91 6.1 Conclusions 91 6.2 Future work 95 Bibliography 97 초 록 121박

    Behavior quantification as the missing link between fields: Tools for digital psychiatry and their role in the future of neurobiology

    Full text link
    The great behavioral heterogeneity observed between individuals with the same psychiatric disorder and even within one individual over time complicates both clinical practice and biomedical research. However, modern technologies are an exciting opportunity to improve behavioral characterization. Existing psychiatry methods that are qualitative or unscalable, such as patient surveys or clinical interviews, can now be collected at a greater capacity and analyzed to produce new quantitative measures. Furthermore, recent capabilities for continuous collection of passive sensor streams, such as phone GPS or smartwatch accelerometer, open avenues of novel questioning that were previously entirely unrealistic. Their temporally dense nature enables a cohesive study of real-time neural and behavioral signals. To develop comprehensive neurobiological models of psychiatric disease, it will be critical to first develop strong methods for behavioral quantification. There is huge potential in what can theoretically be captured by current technologies, but this in itself presents a large computational challenge -- one that will necessitate new data processing tools, new machine learning techniques, and ultimately a shift in how interdisciplinary work is conducted. In my thesis, I detail research projects that take different perspectives on digital psychiatry, subsequently tying ideas together with a concluding discussion on the future of the field. I also provide software infrastructure where relevant, with extensive documentation. Major contributions include scientific arguments and proof of concept results for daily free-form audio journals as an underappreciated psychiatry research datatype, as well as novel stability theorems and pilot empirical success for a proposed multi-area recurrent neural network architecture.Comment: PhD thesis cop

    Brain electric fields, belief in the paranormal, and reading of emotion words

    Get PDF
    The present work reports two experiments on brain electric correlates of cognitive and emotional functions. (1) Studying paranormal belief, 35-channel resting EEG (10 believers and 13 skeptics) was analyzed with "Low Resolution Electromagnetic Tomography" (LORETA) in seven frequency bands. LORETA gravity centers of all bands shifted to the left in believers vs. sceptics, and showed that believers had stronger left fronto-temporo-parietal activity than skeptics. Self-rating of affective attitude showed believers to be less negative than skeptics. The observed EEG lateralization agreed with the ‘valence hypothesis’ that posits predominant left hemispheric processing for positive emotions. (2) Studying emotions, positive and negative emotion words were presented to 21 subjects while "Event-Related Potentials" (ERPs) were recorded. During word presentation (450 ms), 13 microstates (steps of information processing) were identified. Three microstates showed different potential maps for positive vs. negative words; LORETA functional imaging showed stronger activity in microstate #4 (106-122 ms) for positive words right anterior, for negative words left central; in #6 (138-166 ms) for positive words left anterior, for negative words left posterior; in #7 (166-198 ms), for positive words right anterior, for negative words right central. In conclusion: during word processing, the extraction of emotion content starts as early as 106 ms after stimulus onset; the brain identifies emotion content repeatedly in three separate, brief microstate epochs; and, this processing of emotion content in the three microstates involves different brain mechanisms to represent the distinction positive vs. negative valence.Die Arbeit umfasst zwei Experimente zu hirnelektrischen Korrelaten kognitiver und emotionaler Funktionen. (1) Glauben an paranormale Phänomene: 35-Kanal Ruhe-EEG (10 Gläubige, 13 Skeptiker) wurde mit "Low Resolution Electromagnetic Tomography" (LORETA) analysiert (7 EEG-Frequenzbänder). LORETA zeigte Links-Verschiebung der Schwerpunkte aller Bänder bei Gläubigen durch erhöhte Aktivität links fronto-temporo-parietal. Die Affektive Haltung war im Selbst-Rating bei Gläubigen weniger negativ als bei Skeptikern. Die EEG-Lateralisierung passt zur Valenz-Hypothese emotionaler Verarbeitung, die vorwiegend linkshemisphärische Aktivität bei positiver Emotion postuliert. (2) Zur Emotions-Verarbeitung wurden 21 Versuchspersonen emotional positive und negative Wörter gezeigt und dabei "Event-Related Potentials" (ERPs) registriert. 13 Mikrozustände (Informations-Verarbeitungsschritte) wurden während der Darbietungszeit (450 ms) identifiziert. In 3 Mikrozuständen unterschieden sich die topographischen ERP-Karten für positive und negative Wörter. LORETA zeigte erhöhte Aktivität im Mikrozustand #4 (106-122 ms) für positive Wörter rechts anterior, für negative links zentral; im Mikrozustand #6 (138-166 ms) für positive Wörter links anterior, für negative links posterior; im Mikrozustand #7 (166-198 ms) für positive Wörter rechts anterior, für negative rechts zentral. Zusammenfassend: die Extraktion emotionalen Gehalts beginnt bereits 106 ms nach Stimulusbeginn, umfasst repetitiv drei separate, kurze Verarbeitungsschritte, und erfolgt in diesen Schritten auf unterschiedliche Art, d.h. benutzt unterschiedliche Hirnmechanismen zur Inkorporation der Unterscheidung positiv-negativ

    Detection of Verbal and Nonverbal speech features as markers of Depression: results of manual analysis and automatic classification

    Get PDF
    The present PhD project was the result of a multidisciplinary work involving psychiatrists, computing scientists, social signal processing experts and psychology students with the aim to analyse verbal and nonverbal behaviour in patients affected by Depression. Collaborations with several Clinical Health Centers were established for the collection of a group of patients suffering from depressive disorders. Moreover, a group of healthy controls was collected as well. A collaboration with the School of Computing Science of Glasgow University was established with the aim to analysed the collected data. Depression was selected for this study because is one of the most common mental disorder in the world (World Health Organization, 2017) associated with half of all suicides (Lecrubier, 2000). It requires prolonged and expensive medical treatments resulting into a significant burden for both patients and society (Olesen et al., 2012). The use of objective and reliable measurements of depressive symptoms can support the clinicians during the diagnosis reducing the risk of subjective biases and disorder misclassification (see discussion in Chapter 1) and doing the diagnosis in a quick and non-invasive way. Given this, the present PhD project proposes the investigation of verbal (i.e. speech content) and nonverbal (i.e. paralingiuistic features) behaviour in depressed patients to find several speech parameters that can be objective markers of depressive symptoms. The verbal and nonverbal behaviour are investigated through two kind of speech tasks: reading and spontaneous speech. Both manual features extraction and automatic classification approaches are used for this purpose. Differences between acute and remitted patients for prosodic and verbal features have been investigated as well. In addition, unlike other literature studies, in this project differences between subjects with and without Early Maladaptive Schema (EMS: Young et al., 2003) independently from the depressive symptoms, have been investigated with respect to both verbal and nonverbal behaviour. The proposed analysis shows that patients differ from healthy subjects for several verbal and nonverbal features. Moreover, using both reading and spontaneous speech, it is possible to automatically detect Depression with a good accuracy level (from 68 to 76%). These results demonstrate that the investigation of speech features can be a useful instrument, in addition to the current self-reports and clinical interviews, for helping the diagnosis of depressive disorders. Contrary to what was expected, patients in acute and remitted phase do not report differences regarding the nonverbal features and only few differences emerges for the verbal behaviour. At the same way, the automatic classification using paralinguistic features does not work well for the discrimination of subjects with and without EMS and only few differences between them have been found for the verbal behaviour. Possible explanations and limitations of these results will be discussed

    Virtual screening of DrugBank database for hERG blockers using topological Laplacian-assisted AI models

    Full text link
    The human {\it ether-a-go-go} (hERG) potassium channel (Kv11.1_\text{v}11.1) plays a critical role in mediating cardiac action potential. The blockade of this ion channel can potentially lead fatal disorder and/or long QT syndrome. Many drugs have been withdrawn because of their serious hERG-cardiotoxicity. It is crucial to assess the hERG blockade activity in the early stage of drug discovery. We are particularly interested in the hERG-cardiotoxicity of compounds collected in the DrugBank database considering that many DrugBank compounds have been approved for therapeutic treatments or have high potential to become drugs. Machine learning-based in silico tools offer a rapid and economical platform to virtually screen DrugBank compounds. We design accurate and robust classifiers for blockers/non-blockers and then build regressors to quantitatively analyze the binding potency of the DrugBank compounds on the hERG channel. Molecular sequences are embedded with two natural language processing (NPL) methods, namely, autoencoder and transformer. Complementary three-dimensional (3D) molecular structures are embedded with two advanced mathematical approaches, i.e., topological Laplacians and algebraic graphs. With our state-of-the-art tools, we reveal that 227 out of the 8641 DrugBank compounds are potential hERG blockers, suggesting serious drug safety problems. Our predictions provide guidance for the further experimental interrogation of DrugBank compounds' hERG-cardiotoxicity

    Automated screening methods for mental and neuro-developmental disorders

    Get PDF
    Mental and neuro-developmental disorders such as depression, bipolar disorder, and autism spectrum disorder (ASD) are critical healthcare issues which affect a large number of people. Depression, according to the World Health Organisation, is the largest cause of disability worldwide and affects more than 300 million people. Bipolar disorder affects more than 60 million individuals worldwide. ASD, meanwhile, affects more than 1 in 100 people in the UK. Not only do these disorders adversely affect the quality of life of affected individuals, they also have a significant economic impact. While brute-force approaches are potentially useful for learning new features which could be representative of these disorders, such approaches may not be best suited for developing robust screening methods. This is due to a myriad of confounding factors, such as the age, gender, cultural background, and socio-economic status, which can affect social signals of individuals in a similar way as the symptoms of these disorders. Brute-force approaches may learn to exploit effects of these confounding factors on social signals in place of effects due to mental and neuro-developmental disorders. The main objective of this thesis is to develop, investigate, and propose computational methods to screen for mental and neuro-developmental disorders in accordance with descriptions given in the Diagnostic and Statistical Manual (DSM). The DSM manual is a guidebook published by the American Psychiatric Association which offers common language on mental disorders. Our motivation is to alleviate, to an extent, the possibility of machine learning algorithms picking up one of the confounding factors to optimise performance for the dataset – something which we do not find uncommon in research literature. To this end, we introduce three new methods for automated screening for depression from audio/visual recordings, namely: turbulence features, craniofacial movement features, and Fisher Vector based representation of speech spectra. We surmise that psychomotor changes due to depression lead to uniqueness in an individual's speech pattern which manifest as sudden and erratic changes in speech feature contours. The efficacy of these features is demonstrated as part of our solution to Audio/Visual Emotion Challenge 2017 (AVEC 2017) on Depression severity prediction. We also detail a methodology to quantify specific craniofacial movements, which we hypothesised could be indicative of psychomotor retardation, and hence depression. The efficacy of craniofacial movement features is demonstrated using datasets from the 2014 and 2017 editions of AVEC Depression severity prediction challenges. Finally, using the dataset provided as part of AVEC 2016 Depression classification challenge, we demonstrate that differences between speech of individuals with and without depression can be quantified effectively using the Fisher Vector representation of speech spectra. For our work on automated screening of bipolar disorder, we propose methods to classify individuals with bipolar disorder into states of remission, hypo-mania, and mania. Here, we surmise that like depression, individuals with different levels of mania have certain uniqueness to their social signals. Based on this understanding, we propose the use of turbulence features for audio/visual social signals (i.e. speech and facial expressions). We also propose the use of Fisher Vectors to create a unified representation of speech in terms of prosody, voice quality, and speech spectra. These methods have been proposed as part of our solution to the AVEC 2018 Bipolar disorder challenge. In addition, we find that the task of automated screening for ASD is much more complicated. Here, confounding factors can easily overwhelm socials signals which are affected by ASD. We discuss, in the light of research literature and our experimental analysis, that significant collaborative work is required between computer scientists and clinicians to discern social signals which are robust to common confounding factors
    corecore