46 research outputs found

    Centers For Mendelian Genomics: a Decade of Facilitating Gene Discovery

    Get PDF
    PURPOSE: Mendelian disease genomic research has undergone a massive transformation over the past decade. With increasing availability of exome and genome sequencing, the role of Mendelian research has expanded beyond data collection, sequencing, and analysis to worldwide data sharing and collaboration. METHODS: Over the past 10 years, the National Institutes of Health-supported Centers for Mendelian Genomics (CMGs) have played a major role in this research and clinical evolution. RESULTS: We highlight the cumulative gene discoveries facilitated by the program, biomedical research leveraged by the approach, and the larger impact on the research community. Beyond generating a list of gene-phenotype relationships and participating in widespread data sharing, the CMGs have created resources, tools, and training for the larger community to foster understanding of genes and genome variation. The CMGs have participated in a wide range of data sharing activities, including deposition of all eligible CMG data into the Analysis, Visualization, and Informatics Lab-space (AnVIL), sharing candidate genes through the Matchmaker Exchange and the CMG website, and sharing variants in Genotypes to Mendelian Phenotypes (Geno2MP) and VariantMatcher. CONCLUSION: The work is far from complete; strengthening communication between research and clinical realms, continued development and sharing of knowledge and tools, and improving access to richly characterized data sets are all required to diagnose the remaining molecularly undiagnosed patients

    Familial thrombocytopenia due to a complex structural variant resulting in a WAC-ANKRD26 fusion transcript

    Get PDF
    Advances in genome sequencing have resulted in the identification of the causes for numerous rare diseases. However, many cases remain unsolved with standard molecular analyses. We describe a family presenting with a phenotype resembling inherited thrombocytopenia 2 (THC2). THC2 is generally caused by single nucleotide variants that prevent silencing of ANKRD26 expression during hematopoietic differentiation. Short-read whole-exome and genome sequencing approaches were unable to identify a causal variant in this family. Using long-read whole-genome sequencing, a large complex structural variant involving a paired-duplication inversion was identified. Through functional studies, we show that this structural variant results in a pathogenic gain-of-function WAC-ANKRD26 fusion transcript. Our findings illustrate how complex structural variants that may be missed by conventional genome sequencing approaches can cause human disease

    Recommendations for clinical interpretation of variants found in non-coding regions of the genome

    Get PDF
    Background The majority of clinical genetic testing focuses almost exclusively on regions of the genome that directly encode proteins. The important role of variants in non-coding regions in penetrant disease is, however, increasingly being demonstrated, and the use of whole genome sequencing in clinical diagnostic settings is rising across a large range of genetic disorders. Despite this, there is no existing guidance on how current guidelines designed primarily for variants in protein-coding regions should be adapted for variants identified in other genomic contexts. Methods We convened a panel of nine clinical and research scientists with wide-ranging expertise in clinical variant interpretation, with specific experience in variants within non-coding regions. This panel discussed and refined an initial draft of the guidelines which were then extensively tested and reviewed by external groups. Results We discuss considerations specifically for variants in non-coding regions of the genome. We outline how to define candidate regulatory elements, highlight examples of mechanisms through which non-coding region variants can lead to penetrant monogenic disease, and outline how existing guidelines can be adapted for the interpretation of these variants. Conclusions These recommendations aim to increase the number and range of non-coding region variants that can be clinically interpreted, which, together with a compatible phenotype, can lead to new diagnoses and catalyse the discovery of novel disease mechanisms

    Phenotypic spectrum and transcriptomic profile associated with germline variants in TRAF7

    Get PDF
    PURPOSE: Somatic variants in tumor necrosis factor receptor-associated factor 7 (TRAF7) cause meningioma, while germline variants have recently been identified in seven patients with developmental delay and cardiac, facial, and digital anomalies. We aimed to define the clinical and mutational spectrum associated with TRAF7 germline variants in a large series of patients, and to determine the molecular effects of the variants through transcriptomic analysis of patient fibroblasts. METHODS: We performed exome, targeted capture, and Sanger sequencing of patients with undiagnosed developmental disorders, in multiple independent diagnostic or research centers. Phenotypic and mutational comparisons were facilitated through data exchange platforms. Whole-transcriptome sequencing was performed on RNA from patient- and control-derived fibroblasts. RESULTS: We identified heterozygous missense variants in TRAF7 as the cause of a developmental delay-malformation syndrome in 45 patients. Major features include a recognizable facial gestalt (characterized in particular by blepharophimosis), short neck, pectus carinatum, digital deviations, and patent ductus arteriosus. Almost all variants occur in the WD40 repeats and most are recurrent. Several differentially expressed genes were identified in patient fibroblasts. CONCLUSION: We provide the first large-scale analysis of the clinical and mutational spectrum associated with the TRAF7 developmental syndrome, and we shed light on its molecular etiology through transcriptome studies

    Determinants of penetrance and variable expressivity in monogenic metabolic conditions across 77,184 exomes

    Get PDF
    Penetrance of variants in monogenic disease and clinical utility of common polygenic variation has not been well explored on a large-scale. Here, the authors use exome sequencing data from 77,184 individuals to generate penetrance estimates and assess the utility of polygenic variation in risk prediction of monogenic variants

    Size Doesn't Matter: Towards a More Inclusive Philosophy of Biology

    Get PDF
    notes: As the primary author, O’Malley drafted the paper, and gathered and analysed data (scientific papers and talks). Conceptual analysis was conducted by both authors.publication-status: Publishedtypes: ArticlePhilosophers of biology, along with everyone else, generally perceive life to fall into two broad categories, the microbes and macrobes, and then pay most of their attention to the latter. ‘Macrobe’ is the word we propose for larger life forms, and we use it as part of an argument for microbial equality. We suggest that taking more notice of microbes – the dominant life form on the planet, both now and throughout evolutionary history – will transform some of the philosophy of biology’s standard ideas on ontology, evolution, taxonomy and biodiversity. We set out a number of recent developments in microbiology – including biofilm formation, chemotaxis, quorum sensing and gene transfer – that highlight microbial capacities for cooperation and communication and break down conventional thinking that microbes are solely or primarily single-celled organisms. These insights also bring new perspectives to the levels of selection debate, as well as to discussions of the evolution and nature of multicellularity, and to neo-Darwinian understandings of evolutionary mechanisms. We show how these revisions lead to further complications for microbial classification and the philosophies of systematics and biodiversity. Incorporating microbial insights into the philosophy of biology will challenge many of its assumptions, but also give greater scope and depth to its investigations

    De novo variants in the RNU4-2 snRNA cause a frequent neurodevelopmental syndrome

    Get PDF
    Around 60% of individuals with neurodevelopmental disorders (NDD) remain undiagnosed after comprehensive genetic testing, primarily of protein-coding genes1. Large genome-sequenced cohorts are improving our ability to discover new diagnoses in the non-coding genome. Here, we identify the non-coding RNA RNU4-2 as a syndromic NDD gene. RNU4-2 encodes the U4 small nuclear RNA (snRNA), which is a critical component of the U4/U6.U5 tri-snRNP complex of the major spliceosome2. We identify an 18 bp region of RNU4-2 mapping to two structural elements in the U4/U6 snRNA duplex (the T-loop and Stem III) that is severely depleted of variation in the general population, but in which we identify heterozygous variants in 115 individuals with NDD. Most individuals (77.4%) have the same highly recurrent single base insertion (n.64_65insT). In 54 individuals where it could be determined, the de novo variants were all on the maternal allele. We demonstrate that RNU4-2 is highly expressed in the developing human brain, in contrast to RNU4-1 and other U4 homologs. Using RNA-sequencing, we show how 5’ splice site usage is systematically disrupted in individuals with RNU4-2 variants, consistent with the known role of this region during spliceosome activation. Finally, we estimate that variants in this 18 bp region explain 0.4% of individuals with NDD. This work underscores the importance of non-coding genes in rare disorders and will provide a diagnosis to thousands of individuals with NDD worldwide

    Inverting the model of genomics data sharing with the NHGRI Genomic Data Science Analysis, Visualization, and Informatics Lab-space

    Get PDF
    The NHGRI Genomic Data Science Analysis, Visualization, and Informatics Lab-space (AnVIL; https://anvilproject.org) was developed to address a widespread community need for a unified computing environment for genomics data storage, management, and analysis. In this perspective, we present AnVIL, describe its ecosystem and interoperability with other platforms, and highlight how this platform and associated initiatives contribute to improved genomic data sharing efforts. The AnVIL is a federated cloud platform designed to manage and store genomics and related data, enable population-scale analysis, and facilitate collaboration through the sharing of data, code, and analysis results. By inverting the traditional model of data sharing, the AnVIL eliminates the need for data movement while also adding security measures for active threat detection and monitoring and provides scalable, shared computing resources for any researcher. We describe the core data management and analysis components of the AnVIL, which currently consists of Terra, Gen3, Galaxy, RStudio/Bioconductor, Dockstore, and Jupyter, and describe several flagship genomics datasets available within the AnVIL. We continue to extend and innovate the AnVIL ecosystem by implementing new capabilities, including mechanisms for interoperability and responsible data sharing, while streamlining access management. The AnVIL opens many new opportunities for analysis, collaboration, and data sharing that are needed to drive research and to make discoveries through the joint analysis of hundreds of thousands to millions of genomes along with associated clinical and molecular data types

    Human knockouts and phenotypic analysis in a cohort with a high rate of consanguinity

    Get PDF
    A major goal of biomedicine is to understand the function of every gene in the human genome. Loss-of-function mutations can disrupt both copies of a given gene in humans and phenotypic analysis of such 'human knockouts' can provide insight into gene function. Consanguineous unions are more likely to result in offspring carrying homozygous loss-of-function mutations. In Pakistan, consanguinity rates are notably high. Here we sequence the protein-coding regions of 10,503 adult participants in the Pakistan Risk of Myocardial Infarction Study (PROMIS), designed to understand the determinants of cardiometabolic diseases in individuals from South Asia. We identified individuals carrying homozygous predicted loss-of-function (pLoF) mutations, and performed phenotypic analysis involving more than 200 biochemical and disease traits. We enumerated 49,138 rare (<1% minor allele frequency) pLoF mutations. These pLoF mutations are estimated to knock out 1,317 genes, each in at least one participant. Homozygosity for pLoF mutations at PLA2G7 was associated with absent enzymatic activity of soluble lipoprotein-associated phospholipase A2; at CYP2F1, with higher plasma interleukin-8 concentrations; at TREH, with lower concentrations of apoB-containing lipoprotein subfractions; at either A3GALT2 or NRG4, with markedly reduced plasma insulin C-peptide concentrations; and at SLC9A3R1, with mediators of calcium and phosphate signalling. Heterozygous deficiency of APOC3 has been shown to protect against coronary heart disease; we identified APOC3 homozygous pLoF carriers in our cohort. We recruited these human knockouts and challenged them with an oral fat load. Compared with family members lacking the mutation, individuals with APOC3 knocked out displayed marked blunting of the usual post-prandial rise in plasma triglycerides. Overall, these observations provide a roadmap for a 'human knockout project', a systematic effort to understand the phenotypic consequences of complete disruption of genes in humans.D.S. is supported by grants from the National Institutes of Health, the Fogarty International, the Wellcome Trust, the British Heart Foundation, and Pfizer. P.N. is supported by the John S. LaDue Memorial Fellowship in Cardiology from Harvard Medical School. H.-H.W. is supported by a grant from the Samsung Medical Center, Korea (SMO116163). S.K. is supported by the Ofer and Shelly Nemirovsky MGH Research Scholar Award and by grants from the National Institutes of Health (R01HL107816), the Donovan Family Foundation, and Fondation Leducq. Exome sequencing was supported by a grant from the NHGRI (5U54HG003067-11) to S.G. and E.S.L. D.G.M. is supported by a grant from the National Institutes of Health (R01GM104371). J.D. holds a British Heart Foundation Chair, European Research Council Senior Investigator Award, and NIHR Senior Investigator Award. The Cardiovascular Epidemiology Unit at the University of Cambridge, which supported the field work and genotyping of PROMIS, is funded by the UK Medical Research Council, British Heart Foundation, and NIHR Cambridge Biomedical Research Centre ... Fieldwork in the PROMIS study has been supported through funds available to investigators at the Center for Non-Communicable Diseases, Pakistan and the University of Cambridge, UK

    Analysis of protein-coding genetic variation in 60,706 humans

    Get PDF
    Large-scale reference data sets of human genetic variation are critical for the medical and functional interpretation of DNA sequence changes. Here we describe the aggregation and analysis of high-quality exome (protein-coding region) DNA sequence data for 60,706 individuals of diverse ancestries generated as part of the Exome Aggregation Consortium (ExAC). This catalogue of human genetic diversity contains an average of one variant every eight bases of the exome, and provides direct evidence for the presence of widespread mutational recurrence. We have used this catalogue to calculate objective metrics of pathogenicity for sequence variants, and to identify genes subject to strong selection against various classes of mutation; identifying 3,230 genes with near-complete depletion of predicted protein-truncating variants, with 72% of these genes having no currently established human disease phenotype. Finally, we demonstrate that these data can be used for the efficient filtering of candidate disease-causing variants, and for the discovery of human 'knockout' variants in protein-coding genes.Peer reviewe
    corecore