7 research outputs found
Hereditary cancer genes are highly susceptible to splicing mutations
<div><p>Substitutions that disrupt pre-mRNA splicing are a common cause of genetic disease. On average, 13.4% of all hereditary disease alleles are classified as splicing mutations mapping to the canonical 5′ and 3′ splice sites. However, splicing mutations present in exons and deeper intronic positions are vastly underreported. A recent re-analysis of coding mutations in exon 10 of the Lynch Syndrome gene, <i>MLH1</i>, revealed an extremely high rate (77%) of mutations that lead to defective splicing. This finding is confirmed by extending the sampling to five other exons in the <i>MLH1</i> gene. Further analysis suggests a more general phenomenon of defective splicing driving Lynch Syndrome. Of the 36 mutations tested, 11 disrupted splicing. Furthermore, analyzing past reports suggest that <i>MLH1</i> mutations in canonical splice sites also occupy a much higher fraction (36%) of total mutations than expected. When performing a comprehensive analysis of splicing mutations in human disease genes, we found that three main causal genes of Lynch Syndrome, <i>MLH1</i>, <i>MSH2</i>, and <i>PMS2</i>, belonged to a class of 86 disease genes which are enriched for splicing mutations. Other cancer genes were also enriched in the 86 susceptible genes. The enrichment of splicing mutations in hereditary cancers strongly argues for additional priority in interpreting clinical sequencing data in relation to cancer and splicing.</p></div
Enrichment of cancer genes in SSM-prone genes.
<p><b>A.</b> SSM versus all exonic mutations in the HGMD with regions of 99.9% confidence interval shown in gray. COSMIC cancer genes are highlighted in Red. <i>MLH1</i>, <i>BRCA1</i>, <i>BRCA2</i>, and <i>NF1</i> are highlighted and labeled. <b>B-C.</b> Average percent of SSM or ESM in cancer genes versus non-cancer genes reported in HGMD. <b>D.</b> Average HI score of cancer genes in Upper, Expected, and Lower categories of genes.</p
Non-uniform distribution of splicing mutations across disease genes.
<p><b>A.</b> SSM versus all exonic mutations in the HGMD with regions of 99.9% confidence interval shown in gray. Genes with more, expected, and less SSM are shown in red (Upper), blue (Expected), and green (Lower), respectively. Location of <i>MLH1</i>, <i>MSH2</i>, and <i>PMS2</i> are highlighted and labeled. <b>B.</b> Percent ESM of total mutations tested using MaPSy in each category. <b>C</b>. Due to the inability of MaPSy to observe mutant-specific exon skipping events (as a result of the identical flanking exons), ESMs found in MLH1, BRCA1, and OPA1 were validated as individual wildtype and mutant minigene constructs. All three mutant constructs showed exon skipping events, which were not shown in wildtype constructs.</p
Random forest classification and prediction of SSM-prone genes.
<p><b>A.</b> The order of variable importance by mean decease in accuracy for SSM-prone genes versus genes with an expected number of SSM. The directions that associate with SSM-prone genes are indicated, positive directions are green, and negative directions are red. <b>B.</b> Classification performance of the random forest models and the logistic regression models was calculated as the area under the curve (AUC) in receiver operating characteristic (ROC) analysis. <b>C.</b> Scheme of random forest classification on all genomic genes. <b>D.</b> Average proportion of low frequency ExAC splice-site variants per splice-site in predicted SSM-prone genes (probability: 0.60–0.86) versus genes not predicted to be SSM-prone (<i>P</i> = 6.1043e-18, Mann-Whitney). <b>E.</b> Common variants are depleted from the category of variants that cause loss of splice-site signal at the 5′ splice-site (upper plot). Rare variants are enriched in the range of the splice site signal scores that abolish 5′ splice-site recognition (lower plot).</p
<i>MLH1</i> ESM affect different stages of spliceosome assembly.
<p>The percentages of mutant mRNA retained in each stage of the assembly relative to wildtype mRNA are shown for all ESM that were identified in <i>MLH1</i> exon 8 and 15. The majority of ESM were blocked in the transition from A and B complex. Two of the ESM (CM082944 and CM04546) in exon 8 also slowed down the final transesterification reactions to yield spliced mRNA and the lariat.</p
<i>MLH1</i> is frequently disrupted by splicing mutations.
<p><b>A.</b> Disease coding mutations in exons 4, 5, 7, 8 and 15 of <i>MLH1</i> were analyzed with MaPSy. While none of the mutations in exons 4, 5 and 7 (blue bars) were found to disrupt splicing, almost all of the mutations tested in exons 8 and 15 (red bars) significantly altered splicing (100% and 71%, respectively). <b>B.</b> Splicing efficiency of wildtype (blue) and mutant (red) alleles that were tested with MaPSy in exons 8 and 15 of <i>MLH1</i>.</p
Additional file 1 of The unusual gene architecture of polyubiquitin is created by dual-specific splice sites
Additional file 1: Figure S1. Differences between recursive splice sites (RSSs) and dual-specific splice sites (DSSs). Figure S2. Dual-specific splice sites support splicing activity with weaker substrate sequences. Figure S3. The effect of overexpressing HA-tagged ubiquitin in HEK293 cells. Figure S4. RT-PCR analysis of the splicing pattern of mouse UBC genes from brain, liver, and muscle tissues. Figure S5. Evolution of polyubiquitin gene family. Figure S6. Sequence of insert variant X-6533768-T-TCTCGCTCTCCTGACTCAGTGGTTCCTCCACCTGGCTCTCCTGACTCAGTGGTTCTTCCAC in VCX3A, which matches intron 3 of ENST00000398729. Figure S7. Distribution of splice site sequences of introns whose length is a multiple of a surrounding tandem repeat’s unit length. Table S1. Lariat reads recovered from the 5′ splice sites of unannotated dual splice sites. Table S2. Two isoform sequences in Fig. 4. Table S3. Ubiquitin subunit counts of UBC orthologs. Table S4. Introns in tandem repeats