1,040 research outputs found

    Identification of candidate regulatory sequences in mammalian 3' UTRs by statistical analysis of oligonucleotide distributions

    Get PDF
    3' untranslated regions (3' UTRs) contain binding sites for many regulatory elements, and in particular for microRNAs (miRNAs). The importance of miRNA-mediated post-transcriptional regulation has become increasingly clear in the last few years. We propose two complementary approaches to the statistical analysis of oligonucleotide frequencies in mammalian 3' UTRs aimed at the identification of candidate binding sites for regulatory elements. The first method is based on the identification of sets of genes characterized by evolutionarily conserved overrepresentation of an oligonucleotide. The second method is based on the identification of oligonucleotides showing statistically significant strand asymmetry in their distribution in 3' UTRs. Both methods are able to identify many previously known binding sites located in 3'UTRs, and in particular seed regions of known miRNAs. Many new candidates are proposed for experimental verification.Comment: Added two reference

    Target prediction and a statistical sampling algorithm for RNA-RNA interaction

    Get PDF
    It has been proven that the accessibility of the target sites has a critical influence for miRNA and siRNA. In this paper, we present a program, rip2.0, not only the energetically most favorable targets site based on the hybrid-probability, but also a statistical sampling structure to illustrate the statistical characterization and representation of the Boltzmann ensemble of RNA-RNA interaction structures. The outputs are retrieved via backtracing an improved dynamic programming solution for the partition function based on the approach of Huang et al. (Bioinformatics). The O(N6)O(N^6) time and O(N4)O(N^4) space algorithm is implemented in C (available from \url{http://www.combinatorics.cn/cbpc/rip2.html})Comment: 7 pages, 10 figure

    Role of 3′UTRs in the Translation of mRNAs Regulated by Oncogenic eIF4E—A Computational Inference

    Get PDF
    Eukaryotic cap-dependent mRNA translation is mediated by the initiation factor eIF4E, which binds mRNAs and stimulates efficient translation initiation. eIF4E is often overexpressed in human cancers. To elucidate the molecular signature of eIF4E target mRNAs, we analyzed sequence and structural properties of two independently derived polyribosome recruited mRNA datasets. These datasets originate from studies of mRNAs that are actively being translated in response to cells over-expressing eIF4E or cells with an activated oncogenic AKT: eIF4E signaling pathway, respectively. Comparison of eIF4E target mRNAs to mRNAs insensitive to eIF4E-regulation has revealed surprising features in mRNA secondary structure, length and microRNA-binding properties. Fold-changes (the relative change in recruitment of an mRNA to actively translating polyribosomal complexes in response to eIF4E overexpression or AKT upregulation) are positively correlated with mRNA G+C content and negatively correlated with total and 3′UTR length of the mRNAs. A machine learning approach for predicting the fold change was created. Interesting tendencies of secondary structure stability are found near the start codon and at the beginning of the 3′UTR region. Highly upregulated mRNAs show negative selection (site avoidance) for binding sites of several microRNAs. These results are consistent with the emerging model of regulation of mRNA translation through a dynamic balance between translation initiation at the 5′UTR and microRNA binding at the 3′UTR

    Multiple Sequence Alignments Enhance Boundary Definition of RNA Structures

    Get PDF
    Self-contained structured domains of RNA sequences have often distinct molecular functions. Determining the boundaries of structured domains of a non-coding RNA (ncRNA) is needed for many ncRNA gene finder programs that predict RNA secondary structures in aligned genomes because these methods do not necessarily provide precise information about the boundaries or the location of the RNA structure inside the predicted ncRNA. Even without having a structure prediction, it is of interest to search for structured domains, such as for finding common RNA motifs in RNA-protein binding assays. The precise definition of the boundaries are essential for downstream analyses such as RNA structure modelling, e.g., through covariance models, and RNA structure clustering for the search of common motifs. Such efforts have so far been focused on single sequences, thus here we present a comparison for boundary definition between single sequence and multiple sequence alignments. We also present a novel approach, named RNAbound, for finding the boundaries that are based on probabilities of evolutionarily conserved base pairings. We tested the performance of two different methods on a limited number of Rfam families using the annotated structured RNA regions in the human genome and their multiple sequence alignments created from 14 species. The results show that multiple sequence alignments improve the boundary prediction for branched structures compared to single sequences independent of the chosen method. The actual performance of the two methods differs on single hairpin structures and branched structures. For the RNA families with branched structures, including transfer RNA (tRNA) and small nucleolar RNAs (snoRNAs), RNAbound improves the boundary predictions using multiple sequence alignments to median differences of −6 and −11.5 nucleotides (nts) for left and right boundary, respectively (window size of 200 nts)

    miRvestigator: web application to identify miRNAs responsible for co-regulated gene expression patterns discovered through transcriptome profiling

    Get PDF
    Transcriptome profiling studies have produced staggering numbers of gene co-expression signatures for a variety of biological systems. A significant fraction of these signatures will be partially or fully explained by miRNA-mediated targeted transcript degradation. miRvestigator takes as input lists of co-expressed genes from Caenorhabditis elegans, Drosophila melanogaster, G. gallus, Homo sapiens, Mus musculus or Rattus norvegicus and identifies the specific miRNAs that are likely to bind to 3′ un-translated region (UTR) sequences to mediate the observed co-regulation. The novelty of our approach is the miRvestigator hidden Markov model (HMM) algorithm which systematically computes a similarity P-value for each unique miRNA seed sequence from the miRNA database miRBase to an overrepresented sequence motif identified within the 3′-UTR of the query genes. We have made this miRNA discovery tool accessible to the community by integrating our HMM algorithm with a proven algorithm for de novo discovery of miRNA seed sequences and wrapping these algorithms into a user-friendly interface. Additionally, the miRvestigator web server also produces a list of putative miRNA binding sites within 3′-UTRs of the query transcripts to facilitate the design of validation experiments. The miRvestigator is freely available at http://mirvestigator.systemsbiology.net

    Inference of biomolecular interactions from sequence data

    Get PDF
    This thesis describes our work on the inference of biomolecular interactions from sequence data. In particular, the first part of the thesis focuses on proteins and describes computational methods that we have developed for the inference of both intra- and inter-protein interactions from genomic data. The second part of the thesis centers around protein-RNA interactions and describes a method for the inference of binding motifs of RNA-binding proteins from high-throughput sequencing data. The thesis is organized as follows. In the first part, we start by introducing a novel mathematical model for the characterization of protein sequences (chapter 1). We then show how, using genomic data, this model can be successfully applied to two different problems, namely to the inference of interacting amino acid residues in the tertiary structure of protein domains (chapter 2) and to the prediction of protein-protein interactions in large paralogous protein families (chapters 3 and 4). We conclude the first part by a discussion of potential extensions and generalizations of the methods presented (chapter 5). In the second part of this thesis, we first give a general introduction about RNA- binding proteins (chapter 6). We then describe a novel experimental method for the genome-wide identification of target RNAs of RNA-binding proteins and show how this method can be used to infer the binding motifs of RNA-binding proteins (chapter 7). Finally, we discuss a potential mechanism by which KH domain-containing RNA- binding proteins could achieve the specificity of interaction with their target RNAs and conclude the second part of the thesis by proposing a novel type of motif finding algorithm tailored for the inference of their recognition elements (chapter 8)
    corecore