28 research outputs found

    Rare tandem repeat expansions associate with genes involved in synaptic and neuronal signaling functions in schizophrenia

    Get PDF
    Tandem repeat expansions (TREs) are associated with over 60 monogenic disorders and have recently been implicated in complex disorders such as cancer and autism spectrum disorder. The role of TREs in schizophrenia is now emerging. In this study, we have performed a genome-wide investigation of TREs in schizophrenia. Using genome sequence data from 1154 Swedish schizophrenia cases and 934 ancestry-matched population controls, we have detected genome-wide rare (<0.1% population frequency) TREs that have motifs with a length of 2–20 base pairs. We find that the proportion of individuals carrying rare TREs is significantly higher in the schizophrenia group. There is a significantly higher burden of rare TREs in schizophrenia cases than in controls in genic regions, particularly in postsynaptic genes, in genes overlapping brain expression quantitative trait loci, and in brain-expressed genes that are differentially expressed between schizophrenia cases and controls. We demonstrate that TRE-associated genes are more constrained and primarily impact synaptic and neuronal signaling functions. These results have been replicated in an independent Canadian sample that consisted of 252 schizophrenia cases of European ancestry and 222 ancestry-matched controls. Our results support the involvement of rare TREs in schizophrenia etiology

    Machine learning for genetic prediction of psychiatric disorders: a systematic review

    Get PDF
    Machine learning methods have been employed to make predictions in psychiatry from genotypes, with the potential to bring improved prediction of outcomes in psychiatric genetics; however, their current performance is unclear. We aim to systematically review machine learning methods for predicting psychiatric disorders from genetics alone and evaluate their discrimination, bias and implementation. Medline, PsycInfo, Web of Science and Scopus were searched for terms relating to genetics, psychiatric disorders and machine learning, including neural networks, random forests, support vector machines and boosting, on 10 September 2019. Following PRISMA guidelines, articles were screened for inclusion independently by two authors, extracted, and assessed for risk of bias. Overall, 63 full texts were assessed from a pool of 652 abstracts. Data were extracted for 77 models of schizophrenia, bipolar, autism or anorexia across 13 studies. Performance of machine learning methods was highly varied (0.48–0.95 AUC) and differed between schizophrenia (0.54–0.95 AUC), bipolar (0.48–0.65 AUC), autism (0.52–0.81 AUC) and anorexia (0.62–0.69 AUC). This is likely due to the high risk of bias identified in the study designs and analysis for reported results. Choices for predictor selection, hyperparameter search and validation methodology, and viewing of the test set during training were common causes of high risk of bias in analysis. Key steps in model development and validation were frequently not performed or unreported. Comparison of discrimination across studies was constrained by heterogeneity of predictors, outcome and measurement, in addition to sample overlap within and across studies. Given widespread high risk of bias and the small number of studies identified, it is important to ensure established analysis methods are adopted. We emphasise best practices in methodology and reporting for improving future studies

    Rare copy number variation in posttraumatic stress disorder

    Get PDF
    Posttraumatic stress disorder (PTSD) is a heritable (h2 = 24-71%) psychiatric illness. Copy number variation (CNV) is a form of rare genetic variation that has been implicated in the etiology of psychiatric disorders, but no large-scale investigation of CNV in PTSD has been performed. We present an association study of CNV burden and PTSD symptoms in a sample of 114,383 participants (13,036 cases and 101,347 controls) of European ancestry. CNVs were called using two calling algorithms and intersected to a consensus set. Quality control was performed to remove strong outlier samples. CNVs were examined for association with PTSD within each cohort using linear or logistic regression analysis adjusted for population structure and CNV quality metrics, then inverse variance weighted meta-analyzed across cohorts. We examined the genome-wide total span of CNVs, enrichment of CNVs within specified gene-sets, and CNVs overlapping individual genes and implicated neurodevelopmental regions. The total distance covered by deletions crossing over known neurodevelopmental CNV regions was significant (beta = 0.029, SE = 0.005, P = 6.3 × 10-8). The genome-wide neurodevelopmental CNV burden identified explains 0.034% of the variation in PTSD symptoms. The 15q11.2 BP1-BP2 microdeletion region was significantly associated with PTSD (beta = 0.0206, SE = 0.0056, P = 0.0002). No individual significant genes interrupted by CNV were identified. 22 gene pathways related to the function of the nervous system and brain were significant in pathway analysis (FDR q < 0.05), but these associations were not significant once NDD regions were removed. A larger sample size, better detection methods, and annotated resources of CNV are needed to explore this relationship further

    Performance of case-control rare copy number variation annotation in classification of autism

    No full text
    Abstract Background A substantial proportion of Autism Spectrum Disorder (ASD) risk resides in de novo germline and rare inherited genetic variation. In particular, rare copy number variation (CNV) contributes to ASD risk in up to 10% of ASD subjects. Despite the striking degree of genetic heterogeneity, case-control studies have detected specific burden of rare disruptive CNV for neuronal and neurodevelopmental pathways. Here, we used machine learning methods to classify ASD subjects and controls, based on rare CNV data and comprehensive gene annotations. We investigated performance of different methods and estimated the percentage of ASD subjects that could be reliably classified based on presumed etiologic CNV they carry. Results We analyzed 1,892 Caucasian ASD subjects and 2,342 matched controls. Rare CNVs (frequency 1% or less) were detected using Illumina 1M and 1M-Duo BeadChips. Conditional Inference Forest (CF) typically performed as well as or better than other classification methods. We found a maximum AUC (area under the ROC curve) of 0.533 when considering all ASD subjects with rare genic CNVs, corresponding to 7.9% correctly classified ASD subjects and less than 3% incorrectly classified controls; performance was significantly higher when considering only subjects harboring de novo or pathogenic CNVs. We also found rare losses to be more predictive than gains and that curated neurally-relevant annotations (brain expression, synaptic components and neurodevelopmental phenotypes) outperform Gene Ontology and pathway-based annotations. Conclusions CF is an optimal classification approach for case-control rare CNV data and it can be used to prioritize subjects with variants potentially contributing to ASD risk not yet recognized. The neurally-relevant annotations used in this study could be successfully applied to rare CNV case-control data-sets for other neuropsychiatric disorders

    Foreword: CSBio 2017

    Full text link

    Sociodemographic indicators of health status using a machine learning approach and data from the English Longitudinal Study of Aging (ELSA)

    No full text
    202312 bckwVersion of RecordOthersEuropean Union Horizon 2020 Research and Innovation ProgramPublishedC

    Machine learning methodologies versus cardiovascular risk scores, in predicting disease risk

    No full text
    Background: The use of Cardiovascular Disease (CVD) risk estimation scores in primary prevention has long been established. However, their performance still remains a matter of concern. The aim of this study was to explore the potential of using ML methodologies on CVD prediction, especially compared to established risk tool, the HellenicSCORE. Methods: Data from the ATTICA prospective study (n = 2020 adults), enrolled during 2001-02 and followed-up in 2011-12 were used. Three different machine-learning classifiers (k-NN, random forest, and decision tree) were trained and evaluated against 10-year CVD incidence, in comparison with the HellenicSCORE tool (a calibration of the ESC SCORE). Training datasets, consisting from 16 variables to only 5 variables, were chosen, with or without bootstrapping, in an attempt to achieve the best overall performance for the machine learning classifiers. Results: Depending on the classifier and the training dataset the outcome varied in efficiency but was comparable between the two methodological approaches. In particular, the HellenicSCORE showed accuracy 85%, specificity 20%, sensitivity 97%, positive predictive value 87%, and negative predictive value 58%, whereas for the machine learning methodologies, accuracy ranged from 65 to 84%, specificity from 46 to 56%, sensitivity from 67 to 89%, positive predictive value from 89 to 91%, and negative predictive value from 24 to 45%; random forest gave the best results, while the k-NN gave the poorest results. Conclusions: The alternative approach of machine learning classification produced results comparable to that of risk prediction scores and, thus, it can be used as a method of CVD prediction, taking into consideration the advantages that machine learning methodologies may offer. © 2018 The Author(s)
    corecore