13 research outputs found

    Multi-modality machine learning predicting Parkinson's disease

    Get PDF
    Personalized medicine promises individualized disease prediction and treatment. The convergence of machine learning (ML) and available multimodal data is key moving forward. We build upon previous work to deliver multimodal predictions of Parkinson's disease (PD) risk and systematically develop a model using GenoML, an automated ML package, to make improved multi-omic predictions of PD, validated in an external cohort. We investigated top features, constructed hypothesis-free disease-relevant networks, and investigated drug-gene interactions. We performed automated ML on multimodal data from the Parkinson's progression marker initiative (PPMI). After selecting the best performing algorithm, all PPMI data was used to tune the selected model. The model was validated in the Parkinson's Disease Biomarker Program (PDBP) dataset. Our initial model showed an area under the curve (AUC) of 89.72% for the diagnosis of PD. The tuned model was then tested for validation on external data (PDBP, AUC 85.03%). Optimizing thresholds for classification increased the diagnosis prediction accuracy and other metrics. Finally, networks were built to identify gene communities specific to PD. Combining data modalities outperforms the single biomarker paradigm. UPSIT and PRS contributed most to the predictive power of the model, but the accuracy of these are supplemented by many smaller effect transcripts and risk SNPs. Our model is best suited to identifying large groups of individuals to monitor within a health registry or biobank to prioritize for further testing. This approach allows complex predictive models to be reproducible and accessible to the community, with the package, code, and results publicly available

    Identification of novel risk loci, causal insights, and heritable risk for Parkinson's disease: a meta-analysis of genome-wide association studies

    Get PDF
    Background Genome-wide association studies (GWAS) in Parkinson's disease have increased the scope of biological knowledge about the disease over the past decade. We aimed to use the largest aggregate of GWAS data to identify novel risk loci and gain further insight into the causes of Parkinson's disease. Methods We did a meta-analysis of 17 datasets from Parkinson's disease GWAS available from European ancestry samples to nominate novel loci for disease risk. These datasets incorporated all available data. We then used these data to estimate heritable risk and develop predictive models of this heritability. We also used large gene expression and methylation resources to examine possible functional consequences as well as tissue, cell type, and biological pathway enrichments for the identified risk factors. Additionally, we examined shared genetic risk between Parkinson's disease and other phenotypes of interest via genetic correlations followed by Mendelian randomisation. Findings Between Oct 1, 2017, and Aug 9, 2018, we analysed 7·8 million single nucleotide polymorphisms in 37 688 cases, 18 618 UK Biobank proxy-cases (ie, individuals who do not have Parkinson's disease but have a first degree relative that does), and 1·4 million controls. We identified 90 independent genome-wide significant risk signals across 78 genomic regions, including 38 novel independent risk signals in 37 loci. These 90 variants explained 16–36% of the heritable risk of Parkinson's disease depending on prevalence. Integrating methylation and expression data within a Mendelian randomisation framework identified putatively associated genes at 70 risk signals underlying GWAS loci for follow-up functional studies. Tissue-specific expression enrichment analyses suggested Parkinson's disease loci were heavily brain-enriched, with specific neuronal cell types being implicated from single cell data. We found significant genetic correlations with brain volumes (false discovery rate-adjusted p=0·0035 for intracranial volume, p=0·024 for putamen volume), smoking status (p=0·024), and educational attainment (p=0·038). Mendelian randomisation between cognitive performance and Parkinson's disease risk showed a robust association (p=8·00 × 10−7). Interpretation These data provide the most comprehensive survey of genetic risk within Parkinson's disease to date, to the best of our knowledge, by revealing many additional Parkinson's disease risk loci, providing a biological context for these risk factors, and showing that a considerable genetic component of this disease remains unidentified. These associations derived from European ancestry datasets will need to be followed-up with more diverse data. Funding The National Institute on Aging at the National Institutes of Health (USA), The Michael J Fox Foundation, and The Parkinson's Foundation (see appendix for full list of funding sources)

    Genome-wide Analyses Identify KIF5A as a Novel ALS Gene

    Get PDF
    To identify novel genes associated with ALS, we undertook two lines of investigation. We carried out a genome-wide association study comparing 20,806 ALS cases and 59,804 controls. Independently, we performed a rare variant burden analysis comparing 1,138 index familial ALS cases and 19,494 controls. Through both approaches, we identified kinesin family member 5A (KIF5A) as a novel gene associated with ALS. Interestingly, mutations predominantly in the N-terminal motor domain of KIF5A are causative for two neurodegenerative diseases: hereditary spastic paraplegia (SPG10) and Charcot-Marie-Tooth type 2 (CMT2). In contrast, ALS-associated mutations are primarily located at the C-terminal cargo-binding tail domain and patients harboring loss-of-function mutations displayed an extended survival relative to typical ALS cases. Taken together, these results broaden the phenotype spectrum resulting from mutations in KIF5A and strengthen the role of cytoskeletal defects in the pathogenesis of ALS.Peer reviewe

    Defining the causes of sporadic Parkinson's disease in the global Parkinson's genetics program (GP2)

    Get PDF
    The Global Parkinson’s Genetics Program (GP2) will genotype over 150,000 participants from around the world, and integrate genetic and clinical data for use in large-scale analyses to dramatically expand our understanding of the genetic architecture of PD. This report details the workflow for cohort integration into the complex arm of GP2, and together with our outline of the monogenic hub in a companion paper, provides a generalizable blueprint for establishing large scale collaborative research consortia

    Multi-ancestry genome-wide association meta-analysis of Parkinson?s disease

    Get PDF
    Although over 90 independent risk variants have been identified for Parkinson’s disease using genome-wide association studies, most studies have been performed in just one population at a time. Here we performed a large-scale multi-ancestry meta-analysis of Parkinson’s disease with 49,049 cases, 18,785 proxy cases and 2,458,063 controls including individuals of European, East Asian, Latin American and African ancestry. In a meta-analysis, we identified 78 independent genome-wide significant loci, including 12 potentially novel loci (MTF2, PIK3CA, ADD1, SYBU, IRS2, USP8, PIGL, FASN, MYLK2, USP25, EP300 and PPP6R2) and fine-mapped 6 putative causal variants at 6 known PD loci. By combining our results with publicly available eQTL data, we identified 25 putative risk genes in these novel loci whose expression is associated with PD risk. This work lays the groundwork for future efforts aimed at identifying PD loci in non-European populations

    The Parkinson's Disease Genome‐Wide Association Study Locus Browser

    No full text
    International Parkinson's Disease Genomics Consortium (IPDGC).[Background] Parkinson's disease (PD) is a neurodegenerative disease with an often complex component identifiable by genome‐wide association studies. The most recent large‐scale PD genome‐wide association studies have identified more than 90 independent risk variants for PD risk and progression across more than 80 genomic regions. One major challenge in current genomics is the identification of the causal gene(s) and variant(s) at each genome‐wide association study locus. The objective of the current study was to create a tool that would display data for relevant PD risk loci and provide guidance with the prioritization of causal genes and potential mechanisms at each locus.[Methods] We included all significant genome‐wide signals from multiple recent PD genome‐wide association studies including themost recent PD risk genome‐wide association study, age‐at‐onset genome‐wide association study, progression genome‐wide association study, and Asian population PD risk genome‐wide association study. We gathered data for all genes 1 Mb up and downstream of each variant to allow users to assess which gene(s) are most associated with the variant of interest based on a set of self‐ranked criteria. Multiple databases were queried for each gene to collect additional causal data.[Results] We created a PD genome‐wide association study browser tool (https://pdgenetics.shinyapps.io/GWASBrowser/) to assist the PD research community with the prioritization of genes for follow‐up functional studies to identify potential therapeutic targets.[Conclusions] Our PD genome‐wide association study browser tool provides users with a useful method of identifying potential causal genes at all known PD risk loci from large‐scale PD genome‐wide association studies. We plan to update this tool with new relevant data as sample sizes increase and new PD risk loci are discovered. © 2020 The Authors. Movement Disorders published by Wiley Periodicals LLC on behalf of International Parkinson and Movement Disorder Society. This article has been contributed to by US Government employees and their work is in the public domain in the USA.This work was supported in part by the Intramural Research Programs of the National Institute of Neurological Disorders and Stroke (NINDS), the National Institute on Aging (NIA), and the National Institute of Environmental Health Sciences, both part of the National Institutes of Health, Department of Health and Human Services (project numbers 1ZIA‐NS003154, Z01‐AG000949‐02, and Z01‐ES101986). We thank the research participants and employees of 23andMe for making this work possible. C.W. is supported by the UK Dementia Research Institute funded by the Medical Research Council (MRC), Alzheimer's Society and Alzheimer's Research UK. C.S. is supported by the Ser Cymru II program, which is partly funded by Cardiff University and the European Regional Development Fund through the Welsh Government. Data were generated as part of the PsychENCODE Consortium supported by: U01MH103339, U01MH103365, U01MH103392, U01MH103340, U01MH103346, R01MH105472, R01MH094714, R01MH105898, R21MH102791, R21MH105881, R21MH103877, and P50MH106934 awarded to Schahram Akbarian (Icahn School of Medicine at Mount Sinai), Gregory Crawford (Duke), Stella Dracheva (Icahn School of Medicine at Mount Sinai), Peggy Farnham (USC), Mark Gerstein (Yale), Daniel Geschwind (UCLA), Thomas M. Hyde (LIBD), Andrew Jaffe (LIBD), James A. Knowles (USC), Chunyu Liu (UIC), Dalila Pinto (Icahn School of Medicine at Mount Sinai), Nenad Sestan (Yale), Pamela Sklar (Icahn School of Medicine at Mount Sinai), Matthew State (UCSF), Patrick Sullivan (UNC), Flora Vaccarino (Yale), Sherman Weissman (Yale), Kevin White (UChicago), and Peter Zandi (JHU). The Genotype‐Tissue Expression (GTEx) Project was supported by the Common Fund of the Office of the Director of the National Institutes of Health, and by NCI, NHGRI, NHLBI, NIDA, NIMH, and NINDS. The data used for the analyses described in this article were obtained from the GTEx Portal on February 12, 2020. Molecular data for the Trans‐Omics in Precision Medicine (TOPMed) program was supported by the National Heart, Lung, and Blood Institute (NHLBI). Genome sequencing for “NHLBI TOPMed: Atherosclerosis Risk in Communities (ARIC)” (phs001211.v2.p2) was performed at the Broad Institute of MIT and Harvard (3R01HL092577‐06S1)and at the Baylor Human Genome Sequencing Center (3U54HG003273‐12S2, HHSN268201500015C). Genome sequencing for the “NHLBI TOPMed: Cleveland Clinic Atrial Fibrillation (CCAF) Study” (phs001189.v1.p1) was performed at the Broad Institute of MIT and Harvard (3R01HL092577‐06S1). Genome sequencing for “NHLBI TOPMed: Trans‐Omics for Precision Medicine (TOPMed) Whole Genome Sequencing Project: Cardiovascular Health Study (phs001368.v1.p1) was performed at the Baylor Human Genome Sequencing Center (3U54HG003273‐12S2, HHSN268201500015C). Genome sequencing for “NHLBI TOPMed: Partners HealthCare Biobank” (phs001024.v3.p1) was performed at the Broad Institute of MIT and Harvard (3R01HL092577‐06S1). Genome sequencing for “NHLBI TOPMed: Whole Genome Sequencing of Venous Thromboembolism (WGS of VTE)” (phs001402.v1.p1) was performed at the Baylor Human Genome Sequencing Center (3U54HG003273‐12S2, HHSN268201500015C). Genome sequencing for “NHLBI TOPMed: Novel Risk Factors for the Development of Atrial Fibrillation in Women” (phs001040.v3.p1) was performed at the Broad Institute of MIT and Harvard (3R01HL092577‐06S1). Genome sequencing for “NHLBI TOPMed: The Genetics and Epidemiology of Asthma in Barbados” (phs001143.v2.p1) was performed by Illumina Genomic Services (3R01HL104608‐04S1). Genome sequencing for “NHLBI TOPMed: The Vanderbilt Genetic Basis of Atrial Fibrillation” (phs001032.v4.p2) was performed at the Broad Institute of MIT and Harvard (3R01HL092577‐06S1). Genome sequencing for “NHLBI TOPMed: Heart and Vascular Health Study (HVH)” (phs000993.v3.p2) was performed at the Broad Institute of MIT and Harvard (3R01HL092577‐06S1) and at the Baylor Human Genome Sequencing Center (3U54HG003273‐12S2, HHSN268201500015C). Genome sequencing for “NHLBI TOPMed: Genetic Epidemiology of COPD (COPDGene)” (phs000951.v3.p3) was performed at the University of Washington Northwest Genomics Center (3R01HL089856‐08S1) and at the Broad Institute of MIT and Harvard (HHSN268201500014C). Genome sequencing for “NHLBI TOPMed: The Vanderbilt Atrial Fibrillation Ablation Registry” (phs000997.v3.p2) was performed at the Broad Institute of MIT and Harvard (3U54HG003067‐12S2, 3U54HG003067‐13S1). Genome sequencing for “NHLBI TOPMed: The Jackson Heart Study” (phs000964.v3.p1) was performed at the University of Washington Northwest Genomics Center (HHSN268201100037C). Genome sequencing for “NHLBI TOPMed: Genetics of Cardiometabolic Health in the Amish” (phs000956.v3.p1) was performed at the Broad Institute of MIT and Harvard (3R01HL121007‐01S1). Genome sequencing for “NHLBI TOPMed: Massachusetts General Hospital Atrial Fibrillation (MGH AF) Study” (phs001062.v3.p2) was performed at the Broad Institute of MIT and Harvard (3R01HL092577‐06S1, 3U54HG003067‐12S2, 3U54HG003067‐13S1, 3UM1HG008895‐01S2). Genome sequencing for “NHLBI TOPMed: The Framingham Heart Study” (phs000974.v3.p2) was performed at the Broad Institute of MIT and Harvard (3U54HG003067‐12S2). Core support including centralized genomic read mapping and genotype calling, along with variant quality metrics and filtering, were provided by the TOPMed Informatics Research Center (3R01HL‐117626‐02S1; contract HHSN268201800002I). Core support including phenotype harmonization, data management, sample‐identity QC, and general program coordination were provided by the TOPMed Data Coordinating Center (R01HL‐120393; U01HL‐120393; contract HHSN268201800001I). We gratefully acknowledge the studies and participants who provided biological samples and data for TOPMed. The Atherosclerosis Risk in Communities study has been funded in whole or in part with federal funds from the National Heart, Lung, and Blood Institute, National Institute of Health, Department of Health and Human Services, under contract numbers (HHSN268201700001I, HHSN268201700002I, HHSN268201700003I, HHSN268201700004I, and HHSN268201700005I). The authors thank the staff and participants of the ARIC study for their important contributions. The research reported in this article was supported by grants from the National Institutes of Health (NIH) National Heart, Lung, and Blood Institute grants R01 HL090620 and R01 HL111314, the NIH National Center for Research Resources for Case Western Reserve University and Cleveland Clinic Clinical and Translational Science Award (CTSA) UL1‐RR024989, the Department of Cardiovascular Medicine philanthropic research fund, Heart and Vascular Institute, Cleveland Clinic, the Fondation Leducq grant 07‐CVD 03, and the Atrial Fibrillation Innovation Center, state of Ohio. This research was supported by contracts HHSN268201200036C, HHSN268200800007C, N01‐HC85079, N01‐HC‐85080, N01‐HC‐85081, N01‐HC‐85082, N01‐HC‐85083, N01‐HC‐85084, N01‐HC‐85085, N01‐HC‐85086, N01‐HC‐35129, N01‐HC‐15103, N01‐HC‐55222, N01‐HC‐75150, N01‐HC‐45133, and N01‐ HC‐85239; grant numbers U01 HL080295 and U01 HL130014 from the National Heart, Lung, and Blood Institute, and R01 AG023629 from the National Institute on Aging, with additional contribution from the National Institute of Neurological Disorders and Stroke. A full list of principal CHS investigators and institutions can be found at https://chs-nhlbi.org/pi. This article was not prepared in collaboration with CHS investigators and does not necessarily reflect the opinions or views of CHS or the NHLBI. We thank the Broad Institute for generating high‐quality sequence data supported by NHLBI grant 3R01HL092577‐06S1 to Dr. Patrick Ellinor. Funded in part by grants from the National Institutes of Health, National Heart, Lung, and Blood Institute (HL66216 and HL83141), and the National Human Genome Research Institute (HG04735). The Women's Genome Health Study (WGHS) is supported by HL 043851 and HL099355 from the National Heart, Lung, and Blood Institute and CA 047988 from the National Cancer Institute, the Donald W. Reynolds Foundation with collaborative scientific support and funding for genotyping provided by Amgen. AF end‐point confirmation was supported by HL‐093613 and a grant from the Harris Family Foundation and Watkin's Foundation. The Genetics and Epidemiology of Asthma in Barbados is supported by National Institutes of Health (NIH) National Heart, Lung, and Blood Institute TOPMed (R01 HL104608‐S1), and R01 AI20059, K23 HL076322, and RC2 HL101651. The research reported in this article was supported by grants from the American Heart Association to Dr. Darbar (EIA 0940116N), and grants from the National Institutes of Health (NIH) to Dr. Darbar (HL092217), and Dr. Roden (U19 HL65962, and UL1 RR024975). This project was also supported by a CTSA award (UL1TR000445) from the National Center for Advancing Translational Sciences. Its contents are solely the responsibility of the authors and do not necessarily represent official views of the National Center for Advancing Translational Sciences of the NIH. The research reported in this article was supported by grants HL068986, HL085251, HL095080, and HL073410 from the National Heart, Lung, and Blood Institute. This article was not prepared in collaboration with Heart and Vascular Health (HVH) Study investigators and does not necessarily reflect the opinions or views of the HVH Study or the NHLBI. This research used data generated by the COPDGene study, which was supported by NIH grants U01 HL089856 and U01 HL089897. The COPDGene project is also supported by the COPD Foundation through contributions made by an Industry Advisory Board composed of Pfizer, AstraZeneca, Boehringer Ingelheim, Novartis, and Sunovion. Centralized read mapping and genotype calling, along with variant quality metrics and filtering were provided by the TOPMed Informatics Research Center (3R01HL‐117626‐02S1; contract HHSN268201800002I). Phenotype harmonization, data management, sample‐identity QC, and general study coordination were provided by the TOPMed Data Coordinating Center (3R01HL‐120393‐02S1; contract HHSN268201800001I). We gratefully acknowledge the studies and participants who provided biological samples and data for TOPMed. This study is part of the Centers for Common Disease Genomics (CCDG) program, a large‐scale genome sequencing effort to identify rare risk and protective alleles that contribute to a range of common disease phenotypes. The CCDG program is funded by the National Human Genome Research Institute (NHGRI) and the National Heart, Lung, and Blood Institute (NHLBI). Sequencing was completed at the Human Genome Sequencing Center at Baylor College of Medicine under NHGRI grant UM1 HG008898. The research reported in this article was supported by grants from the American Heart Association to Dr. Shoemaker (11CRP742009) and Dr. Darbar (EIA 0940116N), and grants from the National Institutes of Health (NIH) to Dr. Darbar (R01 HL092217) and Dr. Roden (U19 HL65962 and UL1 RR024975). The project was also supported by a CTSA award (UL1 TR00045) from the National Center for Advancing Translational Sciences. Its contents are solely the responsibility of the authors and do not necessarily represent official views of the National Center for Advancing Translational Sciences or the NIH. The Jackson Heart Study (JHS) is supported and conducted in collaboration with Jackson State University (HHSN268201800013I), Tougaloo College (HHSN268201800014I), the Mississippi State Department of Health (HHSN268201800015I/HHSN26800001), and the University of Mississippi Medical Center (HHSN268201800010I, HHSN268201800011I, and HHSN268201800012I) contracts from the National Heart, Lung, and Blood Institute (NHLBI) and the National Institute for Minority Health and Health Disparities (NIMHD). The authors also thank the staffs and participants of the JHS. The Amish studies on which these data are based were supported by NIH grants R01 AG18728, U01 HL072515, R01 HL088119, R01 HL121007, and P30 DK072488. See publication PMID: 18440328. The research reported in this article was supported by NIH grants K23HL071632, K23HL114724, R21DA027021, R01HL092577, R01HL092577S1, R01HL104156, K24HL105780, and U01HL65962. The research has also been supported by an Established Investigator Award from the American Heart Association (13EIA14220013) and by support from the Fondation Leducq (14CVD01). This article was not prepared in collaboration with MGH AF Study investigators and does not necessarily reflect the opinions or views of the MGH AF Study investigators or the NHLBI. The Framingham Heart Study is conducted and supported by the National Heart, Lung, and Blood Institute (NHLBI) in collaboration with Boston University (contract nos. N01‐HC‐25195, HHSN268201500001I, and 75N92019D00031). This article was not prepared in collaboration with investigators of the Framingham Heart Study and does not necessarily reflect the opinions or views of the Framingham Heart Study, Boston University, or NHLBI

    The Parkinson's DiseaseGenome-WideAssociation Study Locus Browser

    No full text
    Background: Parkinson's disease (PD) is a neurodegenerative disease with an often complex component identifiable by genome-wide association studies. The most recent large-scale PD genome-wide association studies have identified more than 90 independent risk variants for PD risk and progression across more than 80 genomic regions. One major challenge in current genomics is the identification of the causal gene(s) and variant(s) at each genome-wide association study locus. The objective of the current study was to create a tool that would display data for relevant PD risk loci and provide guidance with the prioritization of causal genes and potential mechanisms at each locus. Methods: We included all significant genome-wide signals from multiple recent PD genome-wide association studies including themost recent PD risk genome-wide association study, age-at-onset genome-wide association study, progression genome-wide association study, and Asian population PD risk genome-wide association study. We gathered data for all genes 1 Mb up and downstream of each variant to allow users to assess which gene(s) are most associated with the variant of interest based on a set of self-ranked criteria. Multiple databases were queried for each gene to collect additional causal data. Results: We created a PD genome-wide association study browser tool (https://pdgenetics.shinyapps.io/GWASBrowser/) to assist the PD research community with the prioritization of genes for follow-up functional studies to identify potential therapeutic targets. Conclusions: Our PD genome-wide association study browser tool provides users with a useful method of identifying potential causal genes at all known PD risk loci from large-scale PD genome-wide association studies. We plan to update this tool with new relevant data as sample sizes increase and new PD risk loci are discovered

    A reference human induced pluripotent stem cell line for large-scale collaborative studies

    No full text
    Human induced pluripotent stem cell (iPSC) lines are a powerful tool for studying development and disease, but the considerable phenotypic variation between lines makes it challenging to replicate key findings and integrate data across research groups. To address this issue, we sub-cloned candidate human iPSC lines and deeply characterized their genetic properties using whole genome sequencing, their genomic stability upon CRISPR-Cas9-based gene editing, and their phenotypic properties including differentiation to commonly used cell types. These studies identified KOLF2.1J as an all-around well-performing iPSC line. We then shared KOLF2.1J with groups around the world who tested its performance in head-to-head comparisons with their own preferred iPSC lines across a diverse range of differentiation protocols and functional assays. On the strength of these findings, we have made KOLF2.1J and its gene-edited derivative clones readily accessible to promote the standardization required for large-scale collaborative science in the stem cell field

    Moving beyond neurons:the role of cell type-specific gene regulation in Parkinson’s disease heritability

    No full text
    Abstract Parkinson’s disease (PD), with its characteristic loss of nigrostriatal dopaminergic neurons and deposition of α-synuclein in neurons, is often considered a neuronal disorder. However, in recent years substantial evidence has emerged to implicate glial cell types, such as astrocytes and microglia. In this study, we used stratified LD score regression and expression-weighted cell-type enrichment together with several brain-related and cell-type-specific genomic annotations to connect human genomic PD findings to specific brain cell types. We found that PD heritability attributable to common variation does not enrich in global and regional brain annotations or brain-related cell-type-specific annotations. Likewise, we found no enrichment of PD susceptibility genes in brain-related cell types. In contrast, we demonstrated a significant enrichment of PD heritability in a curated lysosomal gene set highly expressed in astrocytic, microglial, and oligodendrocyte subtypes, and in LoF-intolerant genes, which were found highly expressed in almost all tested cellular subtypes. Our results suggest that PD risk loci do not lie in specific cell types or individual brain regions, but rather in global cellular processes detectable across several cell types
    corecore