Search CORE

65 research outputs found

Parametrized Stochastic Grammars for RNA Secondary Structure Prediction

Author: Maier Robert S.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2007
Field of study

We propose a two-level stochastic context-free grammar (SCFG) architecture for parametrized stochastic modeling of a family of RNA sequences, including their secondary structure. A stochastic model of this type can be used for maximum a posteriori estimation of the secondary structure of any new sequence in the family. The proposed SCFG architecture models RNA subsequences comprising paired bases as stochastically weighted Dyck-language words, i.e., as weighted balanced-parenthesis expressions. The length of each run of unpaired bases, forming a loop or a bulge, is taken to have a phase-type distribution: that of the hitting time in a finite-state Markov chain. Without loss of generality, each such Markov chain can be taken to have a bounded complexity. The scheme yields an overall family SCFG with a manageable number of parameters.Comment: 5 pages, submitted to the 2007 Information Theory and Applications Workshop (ITA 2007

arXiv.org e-Print Archive

CiteSeerX

Crossref

RNA secondary structure prediction from multi-aligned sequences

It has been well accepted that the RNA secondary structures of most functional non-coding RNAs (ncRNAs) are closely related to their functions and are conserved during evolution. Hence, prediction of conserved secondary structures from evolutionarily related sequences is one important task in RNA bioinformatics; the methods are useful not only to further functional analyses of ncRNAs but also to improve the accuracy of secondary structure predictions and to find novel functional RNAs from the genome. In this review, I focus on common secondary structure prediction from a given aligned RNA sequence, in which one secondary structure whose length is equal to that of the input alignment is predicted. I systematically review and classify existing tools and algorithms for the problem, by utilizing the information employed in the tools and by adopting a unified viewpoint based on maximum expected gain (MEG) estimators. I believe that this classification will allow a deeper understanding of each tool and provide users with useful information for selecting tools for common secondary structure predictions.Comment: A preprint of an invited review manuscript that will be published in a chapter of the book `Methods in Molecular Biology'. Note that this version of the manuscript may differ from the published versio

arXiv.org e-Print Archive

CiteSeerX

Crossref

SSE: a nucleotide and amino acid sequence analysis platform

Author: A Cochrane
AR Gruber
B Knudsen
DF Robinson
DG Higgins
E Rivas
F Wright
J Felsenstein
J Parker
MO Salminen
NR Markham
P Simmonds
P Simmonds
P Simmonds
Peter Simmonds
PM Sharp
RC Edgar
TH Wang
WH Li
Publication venue: BioMed Central
Publication date: 01/01/2012
Field of study

Abstract Background There is an increasing need to develop bioinformatic tools to organise and analyse the rapidly growing amount of nucleotide and amino acid sequence data in organisms ranging from viruses to eukaryotes. Finding A simple sequence editor (SSE) was developed to create an integrated environment where sequences can be aligned, annotated, classified and directly analysed by a number of built-in bioinformatic programs. SSE incorporates a sequence editor for the creation of sequence alignments, a process assisted by integrated CLUSTAL/MUSCLE alignment programs and automated removal of indels. Sequences can be fully annotated and classified into groups and annotated of sequences and sequence groups and access to analytical programs that analyse diversity, recombination and RNA secondary structure. Methods for analysing sequence diversity include measures of divergence and evolutionary distances, identity plots to detect regions of nucleotide or amino acid homology, reconstruction of sequence changes, mono-, di- and higher order nucleotide compositional biases and codon usage. Association Index calculations, GroupScans, Bootscanning and TreeOrder scans perform phylogenetic analyses that reconcile group membership with tree branching orders and provide powerful methods for examining segregation of alleles and detection of recombination events. Phylogeny changes across alignments and scoring of branching order differences between trees using the Robinson-Fould algorithm allow effective visualisation of the sites of recombination events. RNA secondary and tertiary structures play important roles in gene expression and RNA virus replication. For the latter, persistence of infection is additionally associated with pervasive RNA secondary structure throughout viral genomic RNA that modulates interactions with innate cell defences. SSE provides several programs to scan alignments for RNA secondary structure through folding energy thermodynamic calculations and phylogenetic methods (detection of co-variant changes, and structure conservation between divergent sequences). These analyses complement methods based on detection of sequence constraints, such as suppression of synonymous site variability. For each program, results can be plotted in real time during analysis through an integrated graphics package, providing publication quality graphs. Results can be also directed to tabulated datafiles for import into spreadsheet or database programs for further analysis. Conclusions SSE combines sequence editor functions with analytical tools in a comprehensive and user-friendly package that assists considerably in bioinformatic and evolution research.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Edinburgh Research Explorer

Systematic identification of non-coding RNA 2,2,7-trimethylguanosine cap structures in Caenorhabditis elegans

Author: Aftab Muhammad Nauman
Cai Lun
Chen Runsheng
He Housheng
Jia Dong
Li Tiantian
Skogerbø Geir
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Springer - Publisher Connector

PubMed Central

Watson-Crick pairing, the Heisenberg group and Milnor invariants

Author: Gadgil Siddhartha
Publication venue
Publication date: 18/09/2008
Field of study

We study the secondary structure of RNA determined by Watson-Crick pairing without pseudo-knots using Milnor invariants of links. We focus on the first non-trivial invariant, which we call the Heisenberg invariant. The Heisenberg invariant, which is an integer, can be interpreted in terms of the Heisenberg group as well as in terms of lattice paths. We show that the Heisenberg invariant gives a lower bound on the number of unpaired bases in an RNA secondary structure. We also show that the Heisenberg invariant can predict \emph{allosteric structures} for RNA. Namely, if the Heisenberg invariant is large, then there are widely separated local maxima (i.e., allosteric structures) for the number of Watson-Crick pairs found.Comment: 18 pages; to appear in Journal of Mathematical Biolog

arXiv.org e-Print Archive

Characterising RNA secondary structure space using information entropy

Author
Publication venue: BioMed Central
Publication date
Field of study

Springer - Publisher Connector

CONTRAfold: RNA secondary structure prediction without physics-based models

Author: Chuong B. Do
Daniel A. Woods
Serafim Batzoglou
Publication venue
Publication date: 01/01/2006
Field of study

doi:10.1093/bioinformatics/btl24

CiteSeerX

Improved Measurements of RNA Structure Conservation with Generalized Centroid Estimators

Author: Okada Yohei
Saito Yutaka
Sakakibara Yasubumi
Sato Kengo
Publication venue: Frontiers Research Foundation
Publication date: 01/01/2011
Field of study

Identification of non-protein-coding RNAs (ncRNAs) in genomes is a crucial task for not only molecular cell biology but also bioinformatics. Secondary structures of ncRNAs are employed as a key feature of ncRNA analysis since biological functions of ncRNAs are deeply related to their secondary structures. Although the minimum free energy (MFE) structure of an RNA sequence is regarded as the most stable structure, MFE alone could not be an appropriate measure for identifying ncRNAs since the free energy is heavily biased by the nucleotide composition. Therefore, instead of MFE itself, several alternative measures for identifying ncRNAs have been proposed such as the structure conservation index (SCI) and the base pair distance (BPD), both of which employ MFE structures. However, these measurements are unfortunately not suitable for identifying ncRNAs in some cases including the genome-wide search and incur high false discovery rate. In this study, we propose improved measurements based on SCI and BPD, applying generalized centroid estimators to incorporate the robustness against low quality multiple alignments. Our experiments show that our proposed methods achieve higher accuracy than the original SCI and BPD for not only human-curated structural alignments but also low quality alignments produced by CLUSTAL W. Furthermore, the centroid-based SCI on CLUSTAL W alignments is more accurate than or comparable with that of the original SCI on structural alignments generated with RAF, a high quality structural aligner, for which twofold expensive computational time is required on average. We conclude that our methods are more suitable for genome-wide alignments which are of low quality from the point of view on secondary structures than the original SCI and BPD

Crossref

Directory of Open Access Journals

PubMed Central

Frontiers - Publisher Connector