1,040 research outputs found
Identification of candidate regulatory sequences in mammalian 3' UTRs by statistical analysis of oligonucleotide distributions
3' untranslated regions (3' UTRs) contain binding sites for many regulatory
elements, and in particular for microRNAs (miRNAs). The importance of
miRNA-mediated post-transcriptional regulation has become increasingly clear in
the last few years.
We propose two complementary approaches to the statistical analysis of
oligonucleotide frequencies in mammalian 3' UTRs aimed at the identification of
candidate binding sites for regulatory elements. The first method is based on
the identification of sets of genes characterized by evolutionarily conserved
overrepresentation of an oligonucleotide. The second method is based on the
identification of oligonucleotides showing statistically significant strand
asymmetry in their distribution in 3' UTRs.
Both methods are able to identify many previously known binding sites located
in 3'UTRs, and in particular seed regions of known miRNAs. Many new candidates
are proposed for experimental verification.Comment: Added two reference
Target prediction and a statistical sampling algorithm for RNA-RNA interaction
It has been proven that the accessibility of the target sites has a critical
influence for miRNA and siRNA. In this paper, we present a program, rip2.0, not
only the energetically most favorable targets site based on the
hybrid-probability, but also a statistical sampling structure to illustrate the
statistical characterization and representation of the Boltzmann ensemble of
RNA-RNA interaction structures. The outputs are retrieved via backtracing an
improved dynamic programming solution for the partition function based on the
approach of Huang et al. (Bioinformatics). The time and space
algorithm is implemented in C (available from
\url{http://www.combinatorics.cn/cbpc/rip2.html})Comment: 7 pages, 10 figure
Role of 3′UTRs in the Translation of mRNAs Regulated by Oncogenic eIF4E—A Computational Inference
Eukaryotic cap-dependent mRNA translation is mediated by the initiation factor eIF4E, which binds mRNAs and stimulates efficient translation initiation. eIF4E is often overexpressed in human cancers. To elucidate the molecular signature of eIF4E target mRNAs, we analyzed sequence and structural properties of two independently derived polyribosome recruited mRNA datasets. These datasets originate from studies of mRNAs that are actively being translated in response to cells over-expressing eIF4E or cells with an activated oncogenic AKT: eIF4E signaling pathway, respectively. Comparison of eIF4E target mRNAs to mRNAs insensitive to eIF4E-regulation has revealed surprising features in mRNA secondary structure, length and microRNA-binding properties. Fold-changes (the relative change in recruitment of an mRNA to actively translating polyribosomal complexes in response to eIF4E overexpression or AKT upregulation) are positively correlated with mRNA G+C content and negatively correlated with total and 3′UTR length of the mRNAs. A machine learning approach for predicting the fold change was created. Interesting tendencies of secondary structure stability are found near the start codon and at the beginning of the 3′UTR region. Highly upregulated mRNAs show negative selection (site avoidance) for binding sites of several microRNAs. These results are consistent with the emerging model of regulation of mRNA translation through a dynamic balance between translation initiation at the 5′UTR and microRNA binding at the 3′UTR
Multiple Sequence Alignments Enhance Boundary Definition of RNA Structures
Self-contained structured domains of RNA sequences have often distinct molecular functions. Determining the boundaries of structured domains of a non-coding RNA (ncRNA) is needed for many ncRNA gene finder programs that predict RNA secondary structures in aligned genomes because these methods do not necessarily provide precise information about the boundaries or the location of the RNA structure inside the predicted ncRNA. Even without having a structure prediction, it is of interest to search for structured domains, such as for finding common RNA motifs in RNA-protein binding assays. The precise definition of the boundaries are essential for downstream analyses such as RNA structure modelling, e.g., through covariance models, and RNA structure clustering for the search of common motifs. Such efforts have so far been focused on single sequences, thus here we present a comparison for boundary definition between single sequence and multiple sequence alignments. We also present a novel approach, named RNAbound, for finding the boundaries that are based on probabilities of evolutionarily conserved base pairings. We tested the performance of two different methods on a limited number of Rfam families using the annotated structured RNA regions in the human genome and their multiple sequence alignments created from 14 species. The results show that multiple sequence alignments improve the boundary prediction for branched structures compared to single sequences independent of the chosen method. The actual performance of the two methods differs on single hairpin structures and branched structures. For the RNA families with branched structures, including transfer RNA (tRNA) and small nucleolar RNAs (snoRNAs), RNAbound improves the boundary predictions using multiple sequence alignments to median differences of −6 and −11.5 nucleotides (nts) for left and right boundary, respectively (window size of 200 nts)
miRvestigator: web application to identify miRNAs responsible for co-regulated gene expression patterns discovered through transcriptome profiling
Transcriptome profiling studies have produced staggering numbers of gene co-expression signatures for a variety of biological systems. A significant fraction of these signatures will be partially or fully explained by miRNA-mediated targeted transcript degradation. miRvestigator takes as input lists of co-expressed genes from Caenorhabditis elegans, Drosophila melanogaster, G. gallus, Homo sapiens, Mus musculus or Rattus norvegicus and identifies the specific miRNAs that are likely to bind to 3′ un-translated region (UTR) sequences to mediate the observed co-regulation. The novelty of our approach is the miRvestigator hidden Markov model (HMM) algorithm which systematically computes a similarity P-value for each unique miRNA seed sequence from the miRNA database miRBase to an overrepresented sequence motif identified within the 3′-UTR of the query genes. We have made this miRNA discovery tool accessible to the community by integrating our HMM algorithm with a proven algorithm for de novo discovery of miRNA seed sequences and wrapping these algorithms into a user-friendly interface. Additionally, the miRvestigator web server also produces a list of putative miRNA binding sites within 3′-UTRs of the query transcripts to facilitate the design of validation experiments. The miRvestigator is freely available at http://mirvestigator.systemsbiology.net
Inference of biomolecular interactions from sequence data
This thesis describes our work on the inference of biomolecular interactions from
sequence data. In particular, the first part of the thesis focuses on proteins and
describes computational methods that we have developed for the inference of both
intra- and inter-protein interactions from genomic data. The second part of the thesis
centers around protein-RNA interactions and describes a method for the inference of
binding motifs of RNA-binding proteins from high-throughput sequencing data.
The thesis is organized as follows. In the first part, we start by introducing a
novel mathematical model for the characterization of protein sequences (chapter 1).
We then show how, using genomic data, this model can be successfully applied to
two different problems, namely to the inference of interacting amino acid residues
in the tertiary structure of protein domains (chapter 2) and to the prediction of
protein-protein interactions in large paralogous protein families (chapters 3 and 4).
We conclude the first part by a discussion of potential extensions and generalizations
of the methods presented (chapter 5).
In the second part of this thesis, we first give a general introduction about RNA-
binding proteins (chapter 6). We then describe a novel experimental method for the
genome-wide identification of target RNAs of RNA-binding proteins and show how
this method can be used to infer the binding motifs of RNA-binding proteins (chapter
7). Finally, we discuss a potential mechanism by which KH domain-containing RNA-
binding proteins could achieve the specificity of interaction with their target RNAs
and conclude the second part of the thesis by proposing a novel type of motif finding
algorithm tailored for the inference of their recognition elements (chapter 8)
- …