1,273 research outputs found

    Genome-wide identification and predictive modeling of tissue-specific alternative polyadenylation

    Get PDF
    MOTIVATION: Pre-mRNA cleavage and polyadenylation are essential steps for 3'-end maturation and subsequent stability and degradation of mRNAs. This process is highly controlled by cis-regulatory elements surrounding the cleavage/polyadenylation sites (polyA sites), which are frequently constrained by sequence content and position. More than 50% of human transcripts have multiple functional polyA sites, and the specific use of alternative polyA sites (APA) results in isoforms with variable 3'-untranslated regions, thus potentially affecting gene regulation. Elucidating the regulatory mechanisms underlying differential polyA preferences in multiple cell types has been hindered both by the lack of suitable data on the precise location of cleavage sites, as well as of appropriate tests for determining APAs with significant differences across multiple libraries. RESULTS: We applied a tailored paired-end RNA-seq protocol to specifically probe the position of polyA sites in three human adult tissue types. We specified a linear-effects regression model to identify tissue-specific biases indicating regulated APA; the significance of differences between tissue types was assessed by an appropriately designed permutation test. This combination allowed to identify highly specific subsets of APA events in the individual tissue types. Predictive models successfully classified constitutive polyA sites from a biologically relevant background (auROC = 99.6%), as well as tissue-specific regulated sets from each other. We found that the main cis-regulatory elements described for polyadenylation are a strong, and highly informative, hallmark for constitutive sites only. Tissue-specific regulated sites were found to contain other regulatory motifs, with the canonical polyadenylation signal being nearly absent at brain-specific polyA sites. Together, our results contribute to the understanding of the diversity of post-transcriptional gene regulation. AVAILABILITY: Raw data are deposited on SRA, accession numbers: brain SRX208132, kidney SRX208087 and liver SRX208134. Processed datasets as well as model code are published on our website: http://www.genome.duke.edu/labs/ohler/research/UTR/

    Mammalian Cis-Acting RNA Sequence Elements

    Get PDF
    Cis-acting regulatory sequence elements are sequences contained in the 3′ and 5′ untranslated region, introns, or coding regions of precursor RNAs and mature mRNAs that are selectively recognized by a complementary set of one or more trans-acting factors to regulate posttranscriptional gene expression. This chapter focuses on mammalian cis-acting regulatory elements that had been recently discovered in different regions: pre-processed and mature. The chapter begins with an overview of two large networks of mRNAs that contain conserved AU-rich elements (AREs) or GU-rich elements (GREs), and their role in mammalian cell physiology. Other, less conserved, cis-acting elements and their functional role in different steps of RNA maturation and metabolism will be discussed. The molecular characteristics of pathological cis-acting sequences that rose from gene mutations or transcriptional aberrations are briefly outlined, with the proposed approach to restore normal gene expression. Concise models of the function of posttranscriptional regulatory networks within different cellular compartments conclude this chapter

    Learning the Regulatory Code of Gene Expression

    Get PDF
    Data-driven machine learning is the method of choice for predicting molecular phenotypes from nucleotide sequence, modeling gene expression events including protein-DNA binding, chromatin states as well as mRNA and protein levels. Deep neural networks automatically learn informative sequence representations and interpreting them enables us to improve our understanding of the regulatory code governing gene expression. Here, we review the latest developments that apply shallow or deep learning to quantify molecular phenotypes and decode the cis-regulatory grammar from prokaryotic and eukaryotic sequencing data. Our approach is to build from the ground up, first focusing on the initiating protein-DNA interactions, then specific coding and non-coding regions, and finally on advances that combine multiple parts of the gene and mRNA regulatory structures, achieving unprecedented performance. We thus provide a quantitative view of gene expression regulation from nucleotide sequence, concluding with an information-centric overview of the central dogma of molecular biology

    Mechanisms of MiRNA-based Gene Regulation in C. elegans and Human Cells

    Get PDF
    abstract: Multicellular organisms use precise gene regulation, executed throughout development, to build and sustain various cell and tissue types. Post-transcriptional gene regulation is essential for metazoan development and acts on mRNA to determine its localization, stability, and translation. MicroRNAs (miRNAs) and RNA binding proteins (RBPs) are the principal effectors of post-transcriptional gene regulation and act by targeting the 3'untranslated regions (3'UTRs) of mRNA. MiRNAs are small non-coding RNAs that have the potential to regulate hundreds to thousands of genes and are dysregulated in many prevalent human diseases such as diabetes, Alzheimer's disease, Duchenne muscular dystrophy, and cancer. However, the precise contribution of miRNAs to the pathology of these diseases is not known. MiRNA-based gene regulation occurs in a tissue-specific manner and is implemented by an interplay of poorly understood and complex mechanisms, which control both the presence of the miRNAs and their targets. As a consequence, the precise contributions of miRNAs to gene regulation are not well known. The research presented in this thesis systematically explores the targets and effects of miRNA-based gene regulation in cell lines and tissues. I hypothesize that miRNAs have distinct tissue-specific roles that contribute to the gene expression differences seen across tissues. To address this hypothesis and expand our understanding of miRNA-based gene regulation, 1) I developed the human 3'UTRome v1, a resource for studying post-transcriptional gene regulation. Using this resource, I explored the targets of two cancer-associated miRNAs miR-221 and let-7c. I identified novel targets of both these miRNAs, which present potential mechanisms by which they contribute to cancer. 2) Identified in vivo, tissue-specific targets in the intestine and body muscle of the model organism Caenorhabditis elegans. The results from this study revealed that miRNAs regulate tissue homeostasis, and that alternative polyadenylation and miRNA expression patterns modulate miRNA targeting at the tissue-specific level. 3) Explored the functional relevance of miRNA targeting to tissue-specific gene expression, where I found that miRNAs contribute to the biogenesis of mRNAs, through alternative splicing, by regulating tissue-specific expression of splicing factors. These results expand our understanding of the mechanisms that guide miRNA targeting and its effects on tissue-specific gene expression.Dissertation/ThesisDoctoral Dissertation Molecular and Cellular Biology 201

    Identification and Functional Annotation of Alternatively Spliced Isoforms

    Full text link
    Alternative splicing is a key mechanism for increasing the complexity of transcriptome and proteome in eukaryotic cells. A large portion of multi-exon genes in humans undergo alternative splicing, and this can have significant functional consequences as the proteins translated from alternatively spliced mRNA might have different amino acid sequences and structures. The study of alternative splicing events has been accelerated by the next-generation sequencing technology. However, reconstruction of transcripts from short-read RNA sequencing is not sufficiently accurate. Recent progress in single-molecule long-read sequencing has provided researchers alternative ways to help solve this problem. With the help of both short and long RNA sequencing technologies, tens of thousands of splice isoforms have been catalogued in humans and other species, but relatively few of the protein products of splice isoforms have been characterized functionally, structurally and biochemically. The scope of this dissertation includes using short and long RNA sequencing reads together for the purpose of transcript reconstruction, and using high-throughput RNA-sequencing data and gene ontology functional annotations on gene level to predict functions for alternatively spliced isoforms in mouse and human. In the first chapter, I give an introduction of alternative splicing and discuss the existing studies where next generation sequencing is used for transcript identification. Then, I define the isoform function prediction problem, and explain how it differs from better known gene function prediction problem. In the second chapter of this dissertation, I describe our study where the overall transcriptome of kidney is studied using both long reads from PacBio platform and RNA-seq short reads from Illumina platform. We used short reads to validate full-length transcripts found by long PacBio reads, and generated two high quality sets of transcript isoforms that are expressed in glomerular and tubulointerstitial compartments. In the third chapter, I describe our generic framework, where we implemented and evaluated several related algorithms for isoform function prediction for mouse isoforms. We tested these algorithms through both computational evaluation and experimental validation of the predicted ‘responsible’ isoform(s) and the predicted disparate functions of the isoforms of Cdkn2a and of Anxa6. Our algorithm is the first effort to predict and differentiate isoform functions through large-scale genomic data integration. In the fourth chapter, I present the extension of isoform function prediction study to the protein coding isoforms in human. We used a similar multiple instance learning (MIL)-based approach for predicting the function of protein coding splice variants in human. We evaluated our predictions using literature evidence of ADAM15, LMNA/C, and DMXL2 genes. And in the fifth and final chapter, I give a summary of previous chapters and outline the future directions for alternatively spliced isoform reconstruction and function prediction studies.PHDBioinformaticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/144017/1/ridvan_1.pd

    Application of a NaĂŻve Bayes Classifier to Assign Polyadenylation Sites from 3\u27 End Deep Sequencing Data: A Dissertation

    Get PDF
    Cleavage and polyadenylation of a precursor mRNA is important for transcription termination, mRNA stability, and regulation of gene expression. This process is directed by a multitude of protein factors and cis elements in the pre-mRNA sequence surrounding the cleavage and polyadenylation site. Importantly, the location of the cleavage and polyadenylation site helps define the 3’ untranslated region of a transcript, which is important for regulation by microRNAs and RNA binding proteins. Additionally, these sites have generally been poorly annotated. To identify 3’ ends, many techniques utilize an oligo-dT primer to construct deep sequencing libraries. However, this approach can lead to identification of artifactual polyadenylation sites due to internal priming in homopolymeric stretches of adenines. Previously, simple heuristic filters relying on the number of adenines in the genomic sequence downstream of a putative polyadenylation site have been used to remove these sites of internal priming. However, these simple filters may not remove all sites of internal priming and may also exclude true polyadenylation sites. Therefore, I developed a naïve Bayes classifier to identify putative sites from oligo-dT primed 3’ end deep sequencing as true or false/internally primed. Notably, this algorithm uses a combination of sequence elements to distinguish between true and false sites. Finally, the resulting algorithm is highly accurate in multiple model systems and facilitates identification of novel polyadenylation sites

    FilTar: Using RNA-Seq data to improve microRNA target prediction accuracy in animals

    Get PDF
    MOTIVATION: MicroRNA (miRNA) target prediction algorithms do not generally consider biological context and therefore generic target prediction based on seed binding can lead to a high level of false-positive predictions. Here, we present FilTar, a method that incorporates RNA-Seq data to make miRNA target prediction specific to a given cell type or tissue of interest. RESULTS: We demonstrate that FilTar can be used to: (i) provide sample specific 3'-UTR reannotation; extending or truncating default annotations based on RNA-Seq read evidence and (ii) filter putative miRNA target predictions by transcript expression level, thus removing putative interactions where the target transcript is not expressed in the tissue or cell line of interest. We test the method on a variety of miRNA transfection datasets and demonstrate increased accuracy versus generic miRNA target prediction methods. AVAILABILITY AND IMPLEMENTATION: FilTar is freely available and can be downloaded from https://github.com/TBradley27/FilTar. The tool is implemented using the Python and R programming languages, and is supported on GNU/Linux operating systems. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online

    Genome-wide characterization of intergenic polyadenylation sites redefines gene spaces in Arabidopsis thaliana

    Get PDF
    Background:Messenger RNA polyadenylation is an essential step for the maturation of most eukaryotic mRNAs.Accurate determination of poly(A) sites helps define the 3’-ends of genes, which is important for genome annotation and gene function research. Genomic studies have revealed the presence of poly(A) sites in intergenic regions, which may be attributed to 3’-UTR extensions and novel transcript units. However, there is no systematically evaluation of intergenic poly(A) sites in plants. Results:Approximately 16,000 intergenic poly(A) site clusters (IPAC) in Arabidopsis thaliana were discovered and evaluated at the whole genome level. Based on the distributions of distance from IPACs to nearby sense and antisense genes, these IPACs were classified into three categories. About 70 % of them were from previously unannotated 3’-UTR extensions to known genes, which would extend 6985 transcripts of TAIR10 genome annotation beyond their 3’-ends, with a mean extension of 134 nucleotides. 1317 IPACs were originated from novel intergenic transcripts, 37 of which were likely to be associated with protein coding transcripts. 2957 IPACs corresponded to antisense transcripts for genes on the reverse strand, which might affect 2265 protein coding genes and 39 non-protein-coding genes, including long non-coding RNA genes. The rest of IPACs could be originated from transcriptional read-through or gene mis-annotations. Conclusions:The identified IPACs corresponding to novel transcripts, 3’-UTR extensions, and antisense transcription should be incorporated into current Arabidopsis genome annotation. Comprehensive characterization of IPACs from this study provides insights of alternative polyadenylation and antisense transcription in plants.Funding supports were in part from US National Science Foundation (No. 1541737 to QQL), the Hundred Talent Plans of Fujian Province and Xiamen City (to QQL). This project was also funded by the National Natural Science Foundation of China (Nos. 61201358 and 61174161), the Natural Science Foundation of Fujian Province of China (No. 2012J01154), and the specialized Research Fund for the Doctoral Program of Higher Education of China (Nos. 20120121120038 and 20130121130004), and the Fundamental Research Funds for the Central Universities in China (Xiamen University: Nos. 2013121025, 201412G009, and 2014X0234)

    Global Analyses of the Effect of Different Cellular Contexts on MicroRNA Targeting

    Get PDF
    MicroRNA (miRNA) regulation clearly impacts animal development, but the extent to which development—with its resulting diversity of cellular contexts—impacts miRNA regulation is unclear. Here, we compared cohorts of genes repressed by the same miRNAs in different cell lines and tissues and found that target repertoires were largely unaffected, with secondary effects explaining most of the differential responses detected. Outliers resulting from differential direct targeting were often attributable to alternative 3′ UTR isoform usage that modulated the presence of miRNA sites. More inclusive examination of alternative 3′ UTR isoforms revealed that they influence ~10% of predicted targets when comparing any two cell types. Indeed, considering alternative 3′ UTR isoform usage improved prediction of targeting efficacy significantly beyond the improvements observed when considering constitutive isoform usage. Thus, although miRNA targeting is remarkably consistent in different cell types, considering the 3′ UTR landscape helps predict targeting efficacy and explain differential regulation that is observed.Korea (South). Ministry of Education, Science and Technology (MEST) (National Research Foundation of Korea. NRF-2013R1A1A1010185)National Institutes of Health (U.S.) (Grant RO1 GM067031)National Institutes of Health (U.S.) (Grant K99 GM102319)National Science Foundation (U.S.). Graduate Research Fellowship Progra

    Defining the 5 and 3 landscape of the Drosophila transcriptome with Exo-seq and RNaseH-seq

    Get PDF
    Cells regulate biological responses in part through changes in transcription start sites (TSS) or cleavage and polyadenylation sites (PAS). To fully understand gene regulatory networks, it is therefore critical to accurately annotate cell type-specific TSS and PAS. Here we present a simple and straightforward approach for genome-wide annotation of 5- and 3-RNA ends. Our approach reliably discerns bona fide PAS from false PAS that arise due to internal poly(A) tracts, a common problem with current PAS annotation methods. We applied our methodology to study the impact of temperature on the Drosophila melanogaster head transcriptome. We found hundreds of previously unidentified TSS and PAS which revealed two interesting phenomena: first, genes with multiple PASs tend to harbor a motif near the most proximal PAS, which likely represents a new cleavage and polyadenylation signal. Second, motif analysis of promoters of genes affected by temperature suggested that boundary element association factor of 32 kDa (BEAF-32) and DREF mediates a transcriptional program at warm temperatures, a result we validated in a fly line where beaf-32 is downregulated. These results demonstrate the utility of a high-throughput platform for complete experimental and computational analysis of mRNA-ends to improve gene annotation
    • …
    corecore