Using empirical evidence to predict if and how a DNA variant will disrupt RNA splicing in rare disorders

Abstract

Background The diagnostic rate in Mendelian disorders continues to hover around 50% after genomic testing, meaning that around half of families and clinicians are left with no actionable answer. Variants affecting splicing motifs are particularly challenging to interpret. To conclusively link a splicing variant to disease it’s necessary to determine the consequences of altered splicing on the final mRNA transcript and subsequent protein. Consequently, most probable splicing variants are classified as VUS and unactionable. A range of powerful but opaque algorithms have proliferated for predicting whether a variant alters splicing. Many are based on machine learning and deep learning, with the data and features used to make a specific prediction usually unavailable to be verified and weighted by clinicians. Without detailing the nature and source(s) of evidence used to make each prediction, these algorithms are relegated to the lowest evidence weighting according to globally-accepted, gold standard variant classification rules, established by the ACMG-AMP. In addition, most algorithms currently make no attempt to predict mis-splicing outcomes which will occur as the result of a variant, meaning that bespoke functional testing is still required to discover the variant impact on pre-mRNA splicing and allow ACMG-AMP guided variant reclassification for a definitive molecular diagnosis. There is an urgent need for evidence-based, clinically-validated tools for pathology interpretation of splicing variants. Aims To bridge the gap between data science and genetic pathology, by developing methods based on empirical evidence to predict if and how a DNA variant will disrupt RNA splicing in rare disease. To determine empirical features that accurately inform: 1) spliceosomal selection of a cryptic-donor, in preference to the ‘authentic-donor’ (positioned at the exon-intron junction), and other nearby decoy-donors (any GT or GC) that are not used by the spliceosome, and 2) The mis-splicing events which will occur because of a variant precluding use of the authentic-donor or authentic-acceptor. Methods We use empirical and clinically relevant data to define and evaluate measurable features enriched in (1) cryptic-donors selected by the spliceosome vs decoy-donors (any GT/GC motif) which were not selected by the spliceosome and (2) mis-splicing events (exon skipping or cryptic activation) which occurred because of a splicing variant. Results For 1) we evaluated the use of current algorithms to show that while intrinsic splice-site strength and proximity to the authentic-donor strongly influence spliceosomal selection of a cryptic-donor, these factors alone are not sufficient for accurate prediction. For 2) we find that natural, stochastic mis-splicing events seen in population-based RNA-Seq are remarkably prescient of the mis-splicing events that will occur predominantly after the inactivation of an authentic splice site. Conclusions We’ve created an accurate, evidence-based method to predict the nature of variant -induced mis-splicing. The ability to confidently predict the outcome of a splicing variant is a major step forward which will greatly aid in genetic diagnosis of families with Mendelian disorders

    Similar works