16 research outputs found

    Blood RNA analysis can increase clinical diagnostic rate and resolve variants of uncertain significance

    Get PDF
    Purpose: Diagnosis of genetic disorders is hampered by large numbers of variants of uncertain significance (VUSs) identified through next-generation sequencing. Many such variants may disrupt normal RNA splicing. We examined effects on splicing of a large cohort of clinically identified variants and compared performance of bioinformatic splicing prediction tools commonly used in diagnostic laboratories. Methods: Two hundred fifty-seven variants (coding and noncoding) were referred for analysis across three laboratories. Blood RNA samples underwent targeted reverse transcription polymerase chain reaction (RT-PCR) analysis with Sanger sequencing of PCR products and agarose gel electrophoresis. Seventeen samples also underwent transcriptome-wide RNA sequencing with targeted splicing analysis based on Sashimi plot visualization. Bioinformatic splicing predictions were obtained using Alamut, HSF 3.1, and SpliceAI software. Results: Eighty-five variants (33%) were associated with abnormal splicing. The most frequent abnormality was upstream exon skipping (39/85 variants), which was most often associated with splice donor region variants. SpliceAI had greatest accuracy in predicting splicing abnormalities (0.91) and outperformed other tools in sensitivity and specificity. Conclusion: Splicing analysis of blood RNA identifies diagnostically important splicing abnormalities and clarifies functional effects of a significant proportion of VUSs. Bioinformatic predictions are improving but still make significant errors. RNA analysis should therefore be routinely considered in genetic disease diagnostics.This article is freely available via Open Access. Click on the Publisher URL to access it via the publisher's site.This research was funded by National Institute for Health Research (NIHR) and the NewLife Foundation. The Baralle lab is supported by NIHR Research Professorship to D.B. (RP-2016-07-011).published version, accepted version (6 month embargo), submitted versio

    Methods to identify novel disease genes and uplift diagnosis rates in rare diseases

    No full text
    Since the advent of next generation sequencing technologies, the ability to diagnose rare diseases has improved considerably. Yet despite advances, most rare diseases remain undiagnosed. In part, this is due to a demand for more efficient methods to interpret genomic sequencing data, in addition to the need to establish the phenotypic consequence of variants in genes not yet associated with disease. This thesis describes the development and testing of novel methods to improve diagnostic efficiency in patients with rare diseases, in addition to the discovery of novel disease-gene relationships. Herein describes the DeNovoLOEUF method, which identifies putative pathogenic de novo, loss-of-function variants in both known disease and putative disease genes. The gene-agnostic HiPPo protocol is further described, which prioritises variants identified in sequencing data. Finally, application of the GenePy dimensionality reduction algorithm to identify missed biallelic diagnoses is discussed. DeNovoLOEUF was applied in established disease genes to ~14,000 trios recruited to the 100,000 Genomes Project (100KGP). In total, 98% of all variants identified were proven diagnostic, including 39 new diagnoses missed by 100KGP. DeNovoLOEUF was then applied to novel genes to the same 100KGP cohort. A total of 18 putative disease genes were identified, whereby 12/18 (67%) of these genes have since been functionally validated. For the remaining 6 genes, case series are underway and two of these with supportive functional evidence are presented in this thesis: DDX17 (comprising 11 patients with de novo monoallelic variants and neurodevelopmental phenotypes, named Seaby-Ennis Syndrome); and HDLBP (comprising 7 patients with de novo monoallelic variants and neurodevelopmental phenotypes). Finally, application of the HiPPo protocol was demonstrated to be an effective, efficient, alternative method to interpret genomic data, capable of outperforming strategies used by the NHS Genomic Medicine Service (GMS). The GMS utilises gene panels to analyse sequence data, whereas HiPPo is a panel-agnostic method that prioritises variants using in silico metrics. HiPPo had a superior diagnostic rate per number of variant assessed when compared with the GMS (20% vs 3% respectively). HiPPo further identified all pathogenic variants reported by the GMS and identified an additional missed pathogenic variant. Data presented in this thesis demonstrate how novel methods applied to genomic sequencing data can efficiently enhance diagnosis rates for patients with rare diseases and identify new disease-gene relationships. In turn, these can improve patient outcomes by better elucidating mechanistic understanding of disease, identify novel therapeutic targets, and tailor treatments to specific diseases and individuals. To fully realise the potential of novel methods, additional research is needed. Future plans will involve the use of artificial intelligence to refine methods and models for improved clinical outcomes

    Supplementary materials in support of the thesis "Methods to identify novel disease genes and uplift diagnosis rates in rare diseases"

    No full text
    This dataset supports the thesis entitled &quot;Methods to identify novel disease genes and uplift diagnosis rates in rare diseases&quot; AWARDED BY: Univeristy of Southampton DATE OF AWARD: 2023 This dataset contains: 1. Folder called &#39;Appendix papers&#39; This contains 15 published articles in peer review journals or preprint archives which represent work from my thesis. Appendix Paper 1 | Strategies to Uplift Novel Mendelian Gene Discovery for Improved Clinical Outcomes Appendix Paper 2 | Challenges in the diagnosis and discovery of rare genetic disorders using contemporary sequencing technologies Appendix Paper 3 | The mutational constraint spectrum quantified from variation in 141,456 humans Appendix Paper 4 | Transcript expression-aware annotation improves rare variant interpretation Appendix Paper 5 | Addendum: The mutational constraint spectrum quantified from variation in 141,456 humans Appendix Paper 6 | Advanced variant classification framework reduces the false positive rate of predicted loss of function (pLoF) variants in population sequencing data Appendix Paper 7 | A gene-to-patient approach uplifts novel disease gene discovery and identifies 18 putative novel disease genes Appendix Paper 8 | Response to Ramos et al. Appendix Paper 9 | 100,000 Genomes Pilot on Rare-Disease Diagnosis in Health Care &mdash; Preliminary Report Appendix Paper 10 | Loss-of-function variants in TAF4 are associated with a neurodevelopmental disorder. Human Mutation Appendix Paper 11 | Monogenic de novo variants in DDX17 cause a novel neurodevelopmental disorder Appendix Paper 12 | Targeting de novo loss of function variants in constrained disease genes improves diagnostic rates in the 100,000 Genomes Project Appendix Paper 13 | A gene pathogenicity tool &lsquo;GenePy&rsquo; identifies missed biallelic diagnoses in the 100,000 Genomes Project Appendix Paper 14 | A panel-agnostic strategy &lsquo;HiPPo&rsquo; improves diagnostic efficiency in the UK 2 Genome Medicine Service Appendix Paper 15 | A novel variant in GATM causes idiopathic renal Fanconi syndrome and predicts progression to end-stage kidney disease 2. Folder called &#39;Supplementary Datasets&#39; All data can be opened using Microsoft Excel. Supplementary Dataset SD1 | Enriched biological processes in DDX17 RNA-seq data [Co-author Cyril F. Bourgeois; University of Lyon] Supplementary Dataset SD2 | Curation of pLoF variants in haploinsufficient genes Supplementary Dataset SD3 | Curation of 3362 homozygous pLoF variants [Co-authors Moriel Singer-Berk, Eleina England; Broad Institute of MIT and Harvard] Supplementary Dataset SD4 | Detailed phenotype table of patients with DDX17 variants Supplementary Dataset SD5 | Differentially expressed genes in DDX17-KD cells compared to control cells [Co-author Cyril F. Bourgeois; University of Lyon] Supplementary Dataset SD6 | Detailed phenotype table of patients with HDLBP variants Supplementary Dataset SD7 | Manual curation of 45 remaining variants [Co-author N. Simon Thomas, University of Southampton] Supplementary Dataset SD8 | Re-analysis of DeNovoLOEUF on 100,000 Genomes Project data Supplementary Dataset SD9 | 36 possible missed diagnoses in patients with a cardiomyopathy phenotype Supplementary Dataset SD10 | Genes associated with cardiomyopathies Supplementary Dataset SD11 | Autosomal recessive disease genes Supplementary Dataset SD12 | 682 participants with a potential missed diagnosis Supplementary Dataset SD13 | Variants identified using the HiPPo protocol 3. Folder called &#39;Supplementary Tables&#39; All data can be opened using Microsoft Excel. Supplementary Table S1 | Environmental tools in GEL Supplementary Table S2 | List of 1,815 genes tolerant of homozygous loss-of-function variation [Co-author Moriel Singer-Berk; Broad Institute of MIT and Harvard] Supplementary Table S3 | Genes tolerant of homozygous loss-of-function variation with an OMIM dominant association Supplementary Table S4 | 27 genes with more than one Genomics England kindred affected Supplementary Table S5 | 99 Class 2 and Class 3 genes Supplementary Table S6 | Sequences of siRNAs against DDX17 [Co-author Cyril F. Bourgeois; University of Lyon] Supplementary Table S7 | A summary of high-level phenotypes of the 100,000 Genomes Project patient population Supplementary Table S8 | All human genes curated with a LOEUF score Supplementary Table S9 | 182 participants without a listed cardiomyopathy phenotype that had a pathogenic variant returned by 100KGP in a cardiomyopathy-related gene Supplementary Table S10 | Quality control of 24 samples from 8 families undergoing parallel research exome and clinical genome [Co-author Nichola Grahame; University of Southampton] 4. Folder called &#39;Supplementary Figures&#39; Contains a single word document will the following figures: Supplementary Figure S1 | Crispr/Cas9 microinjection into X. tropicalis eggs produces mosaic homozygous crispant tadpoles encoding truncated Ddx17 which is inherited in the F1 generation [Co-authors Annie Godwin, Matt Guille; University of Portsmouth] Supplementary Figure S2 | The amino acid alignment between the H. sapiens and X. tropicalis Ddx17 proteins [Co-authors Annie Godwin, Matt Guille; University of Portsmouth] Supplementary Figure S3 | F0 mosaic homozygous X. tropicalis display reduced axon outgrowth, and working memory like F1 models, but also gastrulation defects and short term microcephaly [Co-authors Annie Godwin, Matt Guille] Supplementary Figure S4 | Results of dark-light transitions assay and neuronal outgrowth [Co-authors Annie Godwin, Matt Guille; University of Portsmouth] Supplementary Figure S5 | Compound heterozygous ddx17-/- tadpoles are morphologically normal but show working memory deficits [Co-authors Annie Godwin, Matt Guille; University of Portsmouth] Supplementary Figure S6 | Network representation of the top 40 enriched biological processes [Co-author Cyril F. Bourgeois; University of Lyon] Supplementary Figure S7 | Enriched biological processes for down-regulated and up-regulated genes [Co-author Cyril F. Bourgeois; University of Lyon] </span

    Challenges in the diagnosis and discovery of rare genetic disorders using contemporary sequencing technologies

    No full text
    Next generation sequencing has revolutionised rare disease diagnostics. Concomitant with advancing technologies has been a rise in the number of new gene disorders discovered and diagnoses made for patients and their families. However, despite the trend towards whole exome and whole genome sequencing, diagnostic rates remain suboptimal. On average, only ~30% of patients receive a molecular diagnosis. National sequencing projects launched in the last five years are integrating clinical diagnostic testing with research avenues to widen the spectrum of known genetic disorders. Consequently, efforts to diagnose genetic disorders in a clinical setting are now often shared with efforts to prioritise candidate variants for the detection of new disease genes. Herein we discuss some of the biggest obstacles precluding molecular diagnosis and discovery of new gene disorders. We consider bioinformatic and analytical challenges faced when interpreting next generation sequencing data and showcase some of the newest tools available to mitigate these issues. We consider how incomplete penetrance, non-coding variation and structural variants are likely to impact diagnostic rates and we further discuss methods for uplifting novel gene discovery by adopting a gene-to-patient based approach.<br/

    The mutational constraint spectrum quantified from variation in 141,456 humans

    No full text
    Genetic variants that inactivate protein-coding genes are a powerful source of information about the phenotypic consequences of gene disruption: genes that are crucial for the function of an organism will be depleted of such variants in natural populations, whereas non-essential genes will tolerate their accumulation. However, predicted loss-of-function variants are enriched for annotation errors, and tend to be found at extremely low frequencies, so their analysis requires careful variant annotation and very large sample sizes1. Here we describe the aggregation of 125,748 exomes and 15,708 genomes from human sequencing studies into the Genome Aggregation Database (gnomAD). We identify 443,769 high-confidence predicted loss-of-function variants in this cohort after filtering for artefacts caused by sequencing and annotation errors. Using an improved model of human mutation rates, we classify human protein-coding genes along a spectrum that represents tolerance to inactivation, validate this classification using data from model organisms and engineered human cells, and show that it can be used to improve the power of gene discovery for both common and rare disease

    A palindrome-like structure on 16p13.3 is associated with the formation of complex structural variations and SRRM3 haploinsufficiency

    No full text
    SRRM2 encodes a splicing factor recently implicated in developmental disorders due to a statistical enrichment of de novo mutations. Using data from the 100,000 Genomes Project, four unrelated individuals with intellectual disability (ID) were identified, each harbouring de novo whole gene deletions of SRRM2. Deletions ranged between 248-482kb in size and all distal breakpoints clustered within a complex 144kb palindrome situated 75kb upstream of SRRM2. Strikingly, three of the deletions were complex, with inverted internal segments of 45-94kb. In one proband-mother duo, de novo status was inferred by haplotype analysis. Together with two additional patients who harboured smaller predicted protein truncating variants (p.Arg632* and p.Ala2223Leufs*13), we estimate the prevalence of this condition in cohorts of patients with unexplained ID to be ~1/1300. Phenotypic blending, present for two cases with additional pathogenic variants in CASR/PKD1 and SLC17A5, hampered phenotypic delineation of this recently described condition. Our data highlights the benefits of genome sequencing for resolving structural complexity and inferring de novo status. The genomic architecture of 16p13.3 may give rise to relatively high rates of complex rearrangements, adding to the list of loci associated with recurrent genomic disorders

    De novo putative loss-of-function variants in TAF4 are associated with a neuro-developmental disorder.

    No full text
    TATA-binding protein associated factor 4 (TAF4) is a subunit of the Transcription Factor IID (TFIID) complex, a central player in transcription initiation. Other members of this multimeric complex have been implicated previously as monogenic disease genes in human developmental disorders. TAF4 has not been described to date as a monogenic disease gene. We here present a cohort of eight individuals, each carrying de novo putative loss-of-function (pLoF) variants in TAF4 and expressing phenotypes consistent with a neuro-developmental disorder (NDD). Common features include intellectual disability, abnormal behavior, and facial dysmorphisms. We propose TAF4 as a novel dominant disease gene for NDD, and coin this novel disorder “TAF4-related NDD” (T4NDD). We place T4NDD in the context of other disorders related to TFIID subunits, revealing shared features of T4NDD with other TAF-opathies.</p

    The mutational constraint spectrum quantified from variation in 141,456 humans

    Get PDF
    Genetic variants that inactivate protein-coding genes are a powerful source of information about the phenotypic consequences of gene disruption: genes that are crucial for the function of an organism will be depleted of such variants in natural populations, whereas non-essential genes will tolerate their accumulation. However, predicted loss-of-function variants are enriched for annotation errors, and tend to be found at extremely low frequencies, so their analysis requires careful variant annotation and very large sample sizes1. Here we describe the aggregation of 125,748 exomes and 15,708 genomes from human sequencing studies into the Genome Aggregation Database (gnomAD). We identify 443,769 high-confidence predicted loss-of-function variants in this cohort after filtering for artefacts caused by sequencing and annotation errors. Using an improved model of human mutation rates, we classify human protein-coding genes along a spectrum that represents tolerance to inactivation, validate this classification using data from model organisms and engineered human cells, and show that it can be used to improve the power of gene discovery for both common and rare diseases.publishedVersionPeer reviewe
    corecore