268 research outputs found

    Identification, improved modeling and integration of signals to predict constitutive and altering splicing

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Brain and Cognitive Sciences, 2004.Includes bibliographical references.(cont.) manipulation of intronic elements that enables fish genes to be spliced properly in mammalian cells; (iii) A computational analysis using EST data, genome sequence data, and microarray expression data of tissue- specific alternative splicing is conducted, which distinguishes human brain, testis and liver as having unusually high levels of AS, highlights differences in the types of AS occurring commonly in different tissues, and identifies candidate cis-regulatory elements and trans-factors likely to play important roles in tissue-specific AS in human cells; (iv) The identification of a set of discriminatory sequence features and their integration into a statistical machine-learning algorithm, ACEScan, which distinguishes exons subject to evolutionarily conserved alternative splicing from constitutively spliced or lineage-specifically-spliced exons is described; (v) The genome-wide search for and experimental validation of exon-skipping events using the combination of two silencing cis-elements, UAGG and GGGG.The regulation of pre-messenger RNA splicing by the spliceosomal machinery via interactions between cis-regulatory elements and splicing trans-factors to generate a specific mRNA i.e. constitutive splicing, or sometimes many distinct mRNA isoforms i.e. alternative splicing, is still a poorly understood process. Progress into illuminating this process is further exacerbated by the variation of splicing in the multitude of tissues and cell types present, as well as the variation of cis and trans elements in different organisms, and the possibility that some alternative splicing events present in expressed sequence tag (EST) databases may constitute biochemical 'noise' or transient evolutionary fluctuations. Several studies, mainly computational in nature, addressing different questions regarding constitutive and alternative splicing are described here, ranging from improved modeling of splicing signals, studying the variation of alternative splicing in various tissues, analyzing evolutionary differences of cis and trans elements of splicing in various vertebrates, and utilizing attributes indicative of alternative splicing events conserved in human and mouse to identify novel alternatively spliced exons. In particular: (i) A general approach for improved modeling of short sequence motifs, based on the Maximum Entropy principle, that incorporates local adjacent and non-adjacent position dependencies is introduced, and applied to understanding splice site signals. The splice site recognition algorithm, MaxENTScan, performs better than previous models that utilize as input similar length sequences; (ii) The first large-scale bioinformatics study is conducted that identifies similarities and differences in candidate cis-regulatory elements and trans-acting splicingby Gene W. Yeo.Ph.D

    Characterizing Stargardt disease-causing mutations to identify ABCA4 gene lesions amenable to splice intervention therapeutics

    Get PDF
    Stargardt disease (STGD1, OMIM: 248200) is an autosomal recessive retinal dystrophy, characterized by bilateral progressive central vision loss and subretinal deposition of lipofuscin-like substances. The wide spectrum of clinical phenotypes, ranging from childhood-onset cone-rod dystrophy to late-onset macular pattern dystrophy-like disease, indicates a more complex genotype-phenotype correlation than previously believed. The association of mutations in the ATP-binding cassette transporter gene, ABCA4, with STGD1 was first reported in two families in 1997. The ABCA4 protein encoded by ABCA4 is predominantly expressed in outer segments of photoreceptors and retinal pigment epithelial (RPE) cells in retina..

    Alternative pre-mRNA Splicing: Signals and Evolution

    Get PDF
    Alternative pre-mRNA splicing is a major source of transcriptome and proteome diversity. In humans, aberrant splicing is a cause for genetic disease and cancer. Until recently it was believed that almost 95% of all genes undergo constitutive splicing, where introns are always excised and exons are always included into the mature mRNA transcript. It is now widely accepted that alternative splicing is the rule rather than the exception and that perhaps more than 75% of all human genes are alternatively spliced. Despite its importance and its potential role in causing disease, the molecular basis of alternative splicing is still not fully understood. The incompleteness of our knowledge about the human transcriptome makes ab initio predictions of alternative splicing a recent, but important research area. This thesis investigates different aspects of alternative splicing in humans, based upon computational large-scale analyses. We introduce a genetic programming approach to predict alternative splicing events without using expressed sequence tags (ESTs). In contrast to existing methods, our approach relies on sequence information only, and is therefore independent of the existence of orthologous sequences. We analyzed 27,519 constitutively spliced and 9,641 cassette exons (SCE) together with their neighboring introns; in addition we analyzed 33,316 constitutively spliced introns and 2,712 retained introns (SIR). We find that our tool for classifying yields highly accurate predictions on the SIR data, with a sensitivity of 92.1% and a specificity of 79.2%. Prediction accuracies on the SCE data are lower: 47.3% (sensitivity) and 70.9% (specificity), indicating that alternative splicing of introns can be better captured by sequence properties than that of exons. We critically question these findings and in particular discuss the huge impact of the feature "length" on predictions in retained introns. We find that the number of adenosines in an exon, called "feature A" is a highly prominent feature for classification of exons. Adenosines are especially overrepresented in the most abundant exonic splicing enhancers, found in constitutive exons. Furthermore we comment on inconsistencies of the nomenclature and on problems of handling the splicing data. We make suggestions to improve the terminology. For further in silico exploration of sequence properties of exons, we generated a dataset of synthetic exons. We describe a general rule for creating sequences with similar exonic splicing enhancer and -silencer densities to real exons, as well as similar exonic splicing enhancer networks. We find that exonic splicing enhancer densities are well suited for differentiating real and randomized exons, whereas the densities of SR protein binding sites are largely uninformative. Generally, we find that features described on small scale experimental data are not transferable to computational large-scale analyses, which makes creation of rules for alternative splicing prediction based only upon DNA/RNA sequence, an extraordinarily difficult task. According to our findings, we suggest that in case of the SCE, only 20%, and in case of SIR, only 30% of the whole splicing information is encoded on sequence level. In the last chapter we investigated the question whether alternative splicing may be connected to adaptive evolutionary processes in a species or population. Unfortunately, the currently available population genetical tools are not sensitive enough to identify traces of positive or balancing selection on the scale of a few 100bp. Additional problems are the incomplete SNP databases and SNP ascertainment bias. The evolutionary role of alternative splicing remains, at least for the moment, speculative

    COMPUTATIONAL INVESTIGATION OF TRANSCRIPTOMIC AND GENETIC UNDERPINNINGs OF AGING AND HGPS

    Get PDF
    Normal aging is a complex process affecting everyone, and also a major risk factor for many complex diseases. Hutchinson Gilford progeria syndrome (HGPS) is a rare genetic disease with symptoms of aging at a very early age. There are some known and other presumed overlaps between HGPS and normal aging process. My goal in this dissertation is to perform computational investigation in both transcriptomic and genomic level to uncover potential underpinnings of these two models using high throughput genomic data. Firstly in order to detect the common and distinct gene expression patterns between HGPS and normal aging, which might suggest their potential molecular links, I developed a novel approach that leverages co-expressed gene clusters to identify gene clusters whose expression co-varies with age and/or HGPS with limited sample size. Our results recapitulate previously known processes underlying aging as well as suggest numerous unique processes underlying aging and HGPS. Moreover, it isknown that alternative splicing contributes to phenotypic diversity at multiple biological scales, and its dysregulation is implicated in both aging and age-associated diseases in human. We aim to provide more insight into aging and age related diseases by studying splicing regulation. Then secondly we performed the first comparative investigation on splicing predictability of genomic and epigenomic features using a deep neural network model (DNN). We showed genomic features are the primary driver of splicing, and epigenomics is not contributing extra regulatory information independent to genomics. In addition, cross-tissue variability in splicing further complicates its links to age-associated phenotypes and elucidating these links requires a comprehensive map of age-associated splicing changes across multiple tissues. Thus thirdly we generate such a map by analyzing ~8500 RNA-seq samples across 48 tissues in 544 individuals. Employing a stringent model controlling for multiple confounders, we identify 49,869 tissue-specific age-associated splicing events of 7 distinct types. We find that genome-wide splicing profile is a better predictor of biological age than the gene and transcript expression profiles, and furthermore, age-associated splicing provides an additional independent contribution to age-associated complex diseases. In fact in this specific study we presented the first systematic investigation of age-associated splicing changes across tissues, and further strengthening the links between age-associated splicing and age-associated diseases. Besides aging factor, genetic variations also potentially contribute to age-related disease shown by GWAS studies. However, potential interactions between aging and genomic variations have not been elucidated fully. It is highly likely that phenotypic effect of systemic molecular changes through aging may depend on the genotype ofthe individual. Lastly we approximate the environmental changes by age-associated changes in the levels of regulatory proteins, and exploiting the known mechanisms of transcriptional regulation, explore potential causal interaction between genotype and aging toward explaining age-related transcriptional and ultimately, age-related diseases. We detected numerous interactions across 25 tissues and showed they could potentially be associated with hypertension disease. In summary, our investigations in this dissertation provided predictive hallmarks along with implied molecular basis insight about normal aging and HGPS in transcriptomic and genetic level

    Deciphering the Plant Splicing Code: Experimental and Computational Approaches for Predicting Alternative Splicing and Splicing Regulatory Elements

    Get PDF
    Extensive alternative splicing (AS) of precursor mRNAs (pre-mRNAs) in multicellular eukaryotes increases the protein-coding capacity of a genome and allows novel ways to regulate gene expression. In flowering plants, up to 48% of intron-containing genes exhibit AS. However, the full extent of AS in plants is not yet known, as only a few high-throughput RNA-Seq studies have been performed. As the cost of obtaining RNA-Seq reads continues to fall, it is anticipated that huge amounts of plant sequence data will accumulate and help in obtaining a more complete picture of AS in plants. Although it is not an onerous task to obtain hundreds of millions of reads using high-throughput sequencing technologies, computational tools to accurately predict and visualize AS are still being developed and refined. This review will discuss the tools to predict and visualize transcriptome-wide AS in plants using short-reads and highlight their limitations. Comparative studies of AS events between plants and animals have revealed that there are major differences in the most prevalent types of AS events, suggesting that plants and animals differ in the way they recognize exons and introns. Extensive studies have been performed in animals to identify cis-elements involved in regulating AS, especially in exon skipping. However, few such studies have been carried out in plants. Here, we review the current state of research on splicing regulatory elements (SREs) and briefly discuss emerging experimental and computational tools to identify cis-elements involved in regulation of AS in plants. The availability of curated alternative splice forms in plants makes it possible to use computational tools to predict SREs involved in AS regulation, which can then be verified experimentally. Such studies will permit identification of plant-specific features involved in AS regulation and contribute to deciphering the splicing code in plants

    Deep sequencing of pre-translational mRNPs reveals hidden flux through evolutionarily conserved AS-NMD pathways

    Get PDF
    Deep sequencing of mRNAs (RNA-Seq) is now the preferred method for transcriptome-wide quantification of gene expression. Yet many mRNA isoforms, such as those eliminated by nonsense-mediated decay (NMD), are inherently unstable. Thus a significant drawback of steady-state RNA-Seq is that it provides marginal information on the flux through alternative splicing pathways. Measurement of such flux necessitates capture of newly made species prior to mRNA decay. One means to capture nascent mRNAs is affinity purifying either the exon junction complex (EJC) or activated spliceosomes. Late-stage spliceosomes deposit the EJC upstream of exon-exon junctions, where it remains associated until the first round of translation. As most mRNA decay pathways are translation-dependent, these EJC- or spliceosome-associated, pre-translational mRNAs should provide an accurate record of the initial population of alternate mRNA isoforms. Previous work has analyzed the protein composition and structure of pre- translational mRNPs in detail. While in the Moore lab, my project has focused on exploring the diversity of mRNA isoforms contained within these complexes. As expected, known NMD isoforms are more highly represented in pre-translational mRNPs than in RNA-Seq libraries. To investigate whether pre-translational mRNPs contain novel mRNA isoforms, we created a bioinformatics pipeline that identified thousands of previously unannotated splicing events. Though many can be attributed to ā€œsplicing noiseā€, others are evolutionarily-conserved events that produce new AS-NMD isoforms likely involved in maintenance of protein homeostasis. Several of these occur in genes whose overexpression has been linked to poor cancer prognosis

    Yy1 Gene Dosage Effect and Allele-Specific Expression Analysis of Peg3

    Get PDF
    Genomic imprinting is a mechanism that targets epigenetic modifications to regulate gene transcription to express a gene from only one of its two parental alleles. Imprinted genes are typically clustered together and are involved in developmental regulation of the fetus. The paternally expressed gene 3 (Peg3) domain represents one such imprinted gene cluster involved in fetal growth regulation and maternal caring behavior. The transcription and imprinting control of the Peg3 domain requires the transcription factor Yin-Yang 1 (YY1), a protein that plays important roles throughout development. The first part of this work explores evidence for the hypothesis that half a dosage of YY1 may be involved in controlling the transcription and imprinting of Peg3 in vivo. The results reveal that Yy1 most likely functions as a transcriptional repressor in this domain. The results also provide new evidence for bi-allelic expression of Peg3 in the mouse brain. Altogether, this indicates that the maternal allele of Peg3 is expressed and functional in specific areas of the brain, including the choroid plexus, paraventricular nucleus (PVN), and the supraoptic nucleus (SON). The observed bi-allelic expression pattern indicates either de-repression of the maternal allele of the known promoter or the presence of alternative promoters for the Peg3 locus. Therefore, the second part of this work demonstrates that several alternative promoters exist for Peg3. The results reveal that these alternative promoters display allele-, tissue-, and developmental stage-specific expression patterns. This suggests that the activity of these alternative promoters have been functionally selected features for the Peg3 imprinted domain during mammalian evolution. The third part of this work develops a novel methodology that detects alternative promoters for Peg3 by incorporating both 5ā€™ rapid amplification of cDNA ends (5ā€™RACE) and next-generation sequencing (NGS) techniques. The results indicate that this NGS-based 5ā€™RACE protocol is a sensitive and reliable method for detecting low-abundant transcripts and promoters. Overall, the research presented in this dissertation advances our understanding of how the YY1 transcription factor is involved in controlling the Peg3 imprinted domain and how alternative promoters may contribute to the allele-, tissue- and developmental stage-specific, Peg3 expression patterns observed in the mouse

    RNA dysregulation in models of frontotemporal dementia and amyotrophic lateral sclerosis

    Get PDF
    Amyotrophic Lateral Sclerosis (ALS) and Frontotemporal dementia (FTD) are two rare but devastating neurodegenerative diseases that share pathological features and genetic factors. A central question in both diseases is the role of the RNA-binding proteins transactive response DNA-binding protein 43kDa (TDP-43) and fused in sarcoma (FUS). These proteins play a vital role in RNA regulation in all cells but in diseased neurons they alter their cellular localisation to form potentially pathogenic aggregates. This process can be linked to rare genetic mutations in the TARDBP and FUS genes, although most cases of ALS and FTD have no known genetic cause. My work uses the revolutionary technology of RNA sequencing to measure and compare gene expression and RNA splicing in different cellular and animal models of sporadic and genetic disease. Here I present the results of four studies that investigate the biology of TDP-43 and FUS, assessing both their normal cellular roles and the impact of rare disease-causing mutations. In these projects I analyse RNA sequencing data to discover novel gene expression and RNA splicing phenomena. This includes the repression of cryptic splicing by TDP-43 but not FUS, the progressive downregulation of mitochondrial and ribosomal transcripts in a mouse model of FUS ALS, a gain of splicing function by TDP-43 mutations affecting constitutive exon splicing, and widespread changes in intron retention caused by FUS knockout or aggressive FUS mutations. I also discover a novel mechanism for how FUS might regulate its own translation. This work expands on what is currently known about the roles in RNA regulation for TDP- 43 and FUS and provides new avenues for understanding both the causes and progression of ALS and FTD
    • ā€¦
    corecore