498 research outputs found

    A machine learning algorithm to differentiate bipolar disorder from major depressive disorder using an online mental health questionnaire and blood biomarker data

    Get PDF
    The vast personal and economic burden of mood disorders is largely caused by their under- and misdiagnosis, which is associated with ineffective treatment and worsening of outcomes. Here, we aimed to develop a diagnostic algorithm, based on an online questionnaire and blood biomarker data, to reduce the misdiagnosis of bipolar disorder (BD) as major depressive disorder (MDD). Individuals with depressive symptoms (Patient Health Questionnaire-9 score >= 5) aged 18-45 years were recruited online. After completing a purpose-built online mental health questionnaire, eligible participants provided dried blood spot samples for biomarker analysis and underwent the World Health Organization World Mental Health Composite International Diagnostic Interview via telephone, to establish their mental health diagnosis. Extreme Gradient Boosting and nested cross-validation were used to train and validate diagnostic models differentiating BD from MDD in participants who self-reported a current MDD diagnosis. Mean test area under the receiver operating characteristic curve (AUROC) for separating participants with BD diagnosed as MDD (N = 126) from those with correct MDD diagnosis (N = 187) was 0.92 (95% CI: 0.86-0.97). Core predictors included elevated mood, grandiosity, talkativeness, recklessness and risky behaviour. Additional validation in participants with no previous mood disorder diagnosis showed AUROCs of 0.89 (0.86-0.91) and 0.90 (0.87-0.91) for separating newly diagnosed BD (N = 98) from MDD (N = 112) and subclinical low mood (N = 120), respectively. Validation in participants with a previous diagnosis of BD (N = 45) demonstrated sensitivity of 0.86 (0.57-0.96). The diagnostic algorithm accurately identified patients with BD in various clinical scenarios, and could help expedite accurate clinical diagnosis and treatment of BD

    Deep learning and embeddings for problems of computational biology

    Get PDF
    The development of Next Generation Sequencing promotes Biology in the Big Data era. The ever-increasing gap between proteins with known sequences and those with a complete functional annotation requires computational methods for automatic structure and functional annotation. My research has been focusing on proteins and led so far to the development of three novel tools, DeepREx, E-SNPs&GO and ISPRED-SEQ, based on Machine and Deep Learning approaches. DeepREx computes the solvent exposure of residues in a protein chain. This problem is relevant for the definition of structural constraints regarding the possible folding of the protein. DeepREx exploits Long Short-Term Memory layers to capture residue-level interactions between positions distant in the sequence, achieving state-of-the-art performances. With DeepRex, I conducted a large-scale analysis investigating the relationship between solvent exposure of a residue and its probability to be pathogenic upon mutation. E-SNPs&GO predicts the pathogenicity of a Single Residue Variation. Variations occurring on a protein sequence can have different effects, possibly leading to the onset of diseases. E-SNPs&GO exploits protein embeddings generated by two novel Protein Language Models (PLMs), as well as a new way of representing functional information coming from the Gene Ontology. The method achieves state-of-the-art performances and is extremely time-efficient when compared to traditional approaches. ISPRED-SEQ predicts the presence of Protein-Protein Interaction sites in a protein sequence. Knowing how a protein interacts with other molecules is crucial for accurate functional characterization. ISPRED-SEQ exploits a convolutional layer to parse local context after embedding the protein sequence with two novel PLMs, greatly surpassing the current state-of-the-art. All methods are published in international journals and are available as user-friendly web servers. They have been developed keeping in mind standard guidelines for FAIRness (FAIR: Findable, Accessible, Interoperable, Reusable) and are integrated into the public collection of tools provided by ELIXIR, the European infrastructure for Bioinformatics

    Identifying key multi-modal predictors of incipient dementia in Parkinson’s disease: a machine learning analysis and Tree SHAP interpretation

    Get PDF
    BackgroundPersons with Parkinson’s disease (PD) differentially progress to cognitive impairment and dementia. With a 3-year longitudinal sample of initially non-demented PD patients measured on multiple dementia risk factors, we demonstrate that machine learning classifier algorithms can be combined with explainable artificial intelligence methods to identify and interpret leading predictors that discriminate those who later converted to dementia from those who did not.MethodParticipants were 48 well-characterized PD patients (Mbaseline age = 71.6; SD = 4.8; 44% female). We tested 38 multi-modal predictors from 10 domains (e.g., motor, cognitive) in a computationally competitive context to identify those that best discriminated two unobserved baseline groups, PD No Dementia (PDND), and PD Incipient Dementia (PDID). We used Random Forest (RF) classifier models for the discrimination goal and Tree SHapley Additive exPlanation (Tree SHAP) values for deep interpretation.ResultsAn excellent RF model discriminated baseline PDID from PDND (AUC = 0.84; normalized Matthews Correlation Coefficient = 0.76). Tree SHAP showed that ten leading predictors of PDID accounted for 62.5% of the model, as well as their relative importance, direction, and magnitude (risk threshold). These predictors represented the motor (e.g., poorer gait), cognitive (e.g., slower Trail A), molecular (up-regulated metabolite panel), demographic (age), imaging (ventricular volume), and lifestyle (activities of daily living) domains.ConclusionOur data-driven protocol integrated RF classifier models and Tree SHAP applications to selectively identify and interpret early dementia risk factors in a well-characterized sample of initially non-demented persons with PD. Results indicate that leading dementia predictors derive from multiple complementary risk domains
    • …
    corecore