7 research outputs found

    Identification of cis-regulatory sequence variations in individual genome sequences

    Get PDF
    Functional contributions of cis-regulatory sequence variations to human genetic disease are numerous. For instance, disrupting variations in a HNF4A transcription factor binding site upstream of the Factor IX gene contributes causally to hemophilia B Leyden. Although clinical genome sequence analysis currently focuses on the identification of protein-altering variation, the impact of cis-regulatory mutations can be similarly strong. New technologies are now enabling genome sequencing beyond exomes, revealing variation across the non-coding 98% of the genome responsible for developmental and physiological patterns of gene activity. The capacity to identify causal regulatory mutations is improving, but predicting functional changes in regulatory DNA sequences remains a great challenge. Here we explore the existing methods and software for prediction of functional variation situated in the cis-regulatory sequences governing gene transcription and RNA processing

    Validation of predicted mRNA splicing mutations using high-throughput transcriptome data

    Get PDF
    Interpretation of variants present in complete genomes or exomes reveals numerous sequence changes, only a fraction of which are likely to be pathogenic. Mutations have been traditionally inferred from allele frequencies and inheritance patterns in such data. Variants predicted to alter mRNA splicing can be validated by manual inspection of transcriptome sequencing data, however this approach is intractable for large datasets. These abnormal mRNA splicing patterns are characterized by reads demonstrating either exon skipping, cryptic splice site use, and high levels of intron inclusion, or combinations of these properties. We present, Veridical, an in silico method for the automatic validation of DNA sequencing variants that alter mRNA splicing. Veridical performs statistically valid comparisons of the normalized read counts of abnormal RNA species in mutant versus non-mutant tissues. This leverages large numbers of control samples to corroborate the consequences of predicted splicing variants in complete genomes and exomes

    A method of predicting changes in human gene splicing induced by genetic variants in context of cis-acting elements

    Get PDF
    Background: polymorphic variants and mutations disrupting canonical splicing isoforms are among the leading causes of human hereditary disorders. While there is a substantial evidence of aberrant splicing causing Mendelian diseases, the implication of such events in multi-genic disorders is yet to be well understood. We have developed a new tool (SpliceScan II) for predicting the effects of genetic variants on splicing and cis-regulatory elements. The novel Bayesian non-canonical 5’GC splice site (SS) sensor used in our tool allows inference on non-canonical exons. Result: our tool performed favorably when compared with the existing methods in the context of genes linked to the Autism Spectrum Disorder (ASD). SpliceScan II was able to predict more aberrant splicing isoforms triggered by the mutations, as documented in DBASS5 and DBASS3 aberrant splicing databases, than other existing methods. Detrimental effects behind some of the polymorphic variations previously associated with Alzheimer’s and breast cancer could be explained by changes in predicted splicing patterns. Conclusions: we have developed SpliceScan II, an effective and sensitive tool for predicting the detrimental effects of genomic variants on splicing leading to Mendelian and complex hereditary disorders. The method could potentially be used to screen resequenced patient DNA to identify de novo mutations and polymorphic variants that could contribute to a genetic disorde

    Interpretation, Stratification and Validation of Sequence Variants Affecting mRNA Splicing in Complete Human Genome Sequences

    Get PDF
    The Shannon Human Splicing Pipeline software has been developed to analyze variants on a genome-scale. Evidence is provided that this software predicts variants affecting mRNA splicing. Variants are examined through information-based analysis and the context of novel mutations as well as common and rare SNPs with splicing effects are displayed. Potential natural and cryptic mRNA splicing variants are identified, and inactivating mutations are distinguished from leaky mutations. Mutations and rare SNPs were predicted in genomes of three cancer cell lines (U2OS, U251 and A431), supported by expression analyses. After filtering, tractable numbers of potentially deleterious variants are predicted by the software, suitable for further laboratory investigation. In these cell lines, novel functional variants comprised 6–17 inactivating mutations, 1–5 leaky mutations and 6–13 cryptic splicing mutations. Predicted effects were validated by RNA-seq data of the three cell lines, and expression microarray analysis of SNPs in HapMap cell lines

    Interpretation of Mutations, Expression, Copy Number in Somatic Breast Cancer: Implications for Metastasis and Chemotherapy

    Get PDF
    Breast cancer (BC) patient management has been transformed over the last two decades due to the development and application of genome-wide technologies. The vast amounts of data generated by these assays, however, create new challenges for accurate and comprehensive analysis and interpretation. This thesis describes novel methods for fluorescence in-situ hybridization (FISH), array comparative genomic hybridization (aCGH), and next generation DNA- and RNA-sequencing, to improve upon current approaches used for these technologies. An ab initio algorithm was implemented to identify genomic intervals of single copy and highly divergent repetitive sequences that were applied to FISH and aCGH probe design. FISH probes with higher resolution than commercially available reagents were developed and validated on metaphase chromosomes. An aCGH microarray was developed that had improved reproducibility compared to the standard Agilent 44K array, which was achieved by placing oligonucleotide probes distant from conserved repetitive sequences. Splicing mutations are currently underrepresented in genome-wide sequencing analyses, and there are limited methods to validate genome-wide mutation predictions. This thesis describes Veridical, a program developed to statistically validate aberrant splicing caused by a predicted mutation. Splicing mutation analysis was performed on a large subset of BC patients previously analyzed by the Cancer Genome Atlas. This analysis revealed an elevated number of splicing mutations in genes involved in NCAM pathways in basal-like and HER2-enriched lymph node positive tumours. Genome-wide technologies were leveraged further to develop chemosensitivity models that predict BC response to paclitaxel and gemcitabine. A type of machine learning, called support vector machines (SVM), was used to create predictive models from small sets of biologically-relevant genes to drug disposition or resistance. SVM models generated were able to predict sensitivity in two groups of independent patient data. High variability between individuals requires more accurate and higher resolution genomic data. However the data themselves are insufficient; also needed are more insightful analytical methods to fully exploit these data. This dissertation presents both improvements in data quality and accuracy as well as analytical procedures, with the aim of detecting and interpreting critical genomic abnormalities that are hallmarks of BC subtypes, metastasis and therapy response

    Genetic variation and mRNA splicing in familial breast cancer genes

    Get PDF
    Germline variants in high-penetrance breast cancer susceptibility genes BRCA1 and BRCA2 are often identified through routine diagnostic gene screening, typically performed for individuals from high-risk breast-ovarian families. Many BRCA1 and BRCA2 variants are known to increase breast cancer risk by disrupting mRNA splicing and compromising the tumour suppressor function of these genes, which work to repair single and double stranded breaks in DNA. A significant number of BRCA1/2 sequence variants are located in or near splice sites and splicing regulatory regions and may disrupt mRNA splicing; however, the relative level to which mRNA splicing is modulated by these common or rare DNA sequence variants has not been ascertained. Splicing assays undertaken to assess the clinical relevance of rare sequence variants in BRCA1 and BRCA2 typically utilise a PCR-based approach and are able to detect the presence of aberrant isoforms and/or the absence of naturally occurring isoforms. The isoforms expressed are useful for determining the disruptive potential of each variant using the classification guidelines recommended by the ENIGMA (Evidence-based Network for the Interpretation of Germline Mutant Alleles) consortium. However, to date PCR-based assays used to evaluate potential BRCA1 and BRCA2 spliceogenic variants have typically only provided qualitative expression profiles. Thus, detection of quantitative and allele-specific expression changes, which may be associated with variants conferring disease risk, has been limited. Advances in sequencing platforms over the last decade have produced technologies that generate quantitative and high throughput expression information, simultaneously overcoming many of the limitations previously reported for PCR-based approaches in splicing assays. The work presented here exploits the capabilities of a targeted RNA-seq platform to generate the first comprehensive expression profile of normally expressed BRCA1/BRCA2 mRNA isoforms from lymphoblastoid cell lines (LCLs). Although a high degree of mRNA expression variability was identified across samples, the calculated expression levels made it possible to highlight changes present in rare variant samples outside the expected natural range. Additionally, results from this work identified instances where PCR-based assay primer design prevented isoform detection in variant samples. Targeted RNA-seq was coupled with allele-specific expression (ASE) analysis to further explore the potential spliceogenic impact of genetic variation in BRCA1 and BRCA2, highlighting ASE changes to natural mRNA isoforms for carriers of a rare variant. While the common variants included in this work were not found to have an obvious impact on splicing, observed allelic imbalances indicate that additional factors are likely to be influencing the mRNA expression variation seen. Exploration of this hypothesis found that common culturing practices, including liquid N2 storage and treatment with a nonsense mediated decay inhibitor, did not impact the mRNA isoforms expressed in a single LCL over time. However, the technology used for mRNA detection was found to play a significant role, with a direct relationship between the number of alternative events detected and the read depth in each sample. Further work into the extent to which cellular heterogeneity contributes to the observed mRNA variability was undertaken with a novel in situ hybridisation platform (RNAscope) to establish the level of variability in mRNA expression between individual cells. The results from this work highlighted how the inter-cell variation in BRCA1 and BRCA2 expression patterns is considerable, potentially explaining why variability is commonly observed when studying mRNA expression at a cell population level. Candidate gene analysis was completed for four patients who have a history of breast cancer but do not carry any disease-associated variants in BRCA1/2. This work did not identify any variants in other known susceptibility genes that are likely to have contributed towards their disease and further investigation into unexplored regions of the genome would be required to identify an underpinning genetic cause. Many women predicted to be high risk for breast cancer have yet to have their genetic basis successfully identified through genetic testing. The work undertaken here has established a technique to quantitatively assess BRCA1 and BRCA2 mRNA isoforms, while identifying technical and biological factors that influence the observed variability. This comprehensive analysis would benefit patient management as it provides a more informative base for variant classification from a better understanding of how disruptive any given genetic variant is likely to be. Clinicians and genetic counsellors will have the capacity to council patients more effectively as more variants of unknown clinical significance will be able to be given a classification that more informatively highlights their associated genetic risk. These more conclusive genetic test results will mean that patients are also less likely to be subjected to the stress and uncertainty that would otherwise be present with reported variants that remain unclassified. This work provides the basis for further studies to extend this work to other known breast cancer susceptibility genes, providing a more comprehensive assessment to identify variants that are likely to be influencing disease risk in high risk women. Such data will be critical for the future interpretation of splicing analyses in a diagnostic setting
    corecore