Search CORE

A method for aligning RNA secondary structures and its application to RNA motif detection

Author: Hu Jun
Liu Jianghui
Tian Bin
Wang Jason TL
Publication venue: BioMed Central
Publication date: 01/01/2005
Field of study

BACKGROUND: Alignment of RNA secondary structures is important in studying functional RNA motifs. In recent years, much progress has been made in RNA motif finding and structure alignment. However, existing tools either require a large number of prealigned structures or suffer from high time complexities. This makes it difficult for the tools to process RNAs whose prealigned structures are unavailable or process very large RNA structure databases. RESULTS: We present here an efficient tool called RSmatch for aligning RNA secondary structures and for motif detection. Motivated by widely used algorithms for RNA folding, we decompose an RNA secondary structure into a set of atomic structure components that are further organized by a tree model to capture the structural particularities. RSmatch can find the optimal global or local alignment between two RNA secondary structures using two scoring matrices, one for single-stranded regions and the other for double-stranded regions. The time complexity of RSmatch is O(mn) where m is the size of the query structure and n that of the subject structure. When applied to searching a structure database, RSmatch can find similar RNA substructures, and is capable of conducting multiple structure alignment and iterative database search. Therefore it can be used to identify functional RNA motifs. The accuracy of RSmatch is tested by experiments using a number of known RNA structures, including simple stem-loops and complex structures containing junctions. CONCLUSION: With respect to computing efficiency and accuracy, RSmatch compares favorably with other tools for RNA structure alignment and motif detection. This tool shall be useful to researchers interested in comparing RNA structures obtained from wet lab experiments or RNA folding programs, particularly when the size of the structure dataset is large

Can Clustal-style progressive pairwise alignment of multiple sequences be used in RNA secondary structure prediction?

Author: Amelia B Bellamy-Royds
B Masoumi
D Gautheret
D Mathews
D Sankoff
DH Mathews
DH Mathews
G Pavesi
G Storz
IL Hofacker
IL Hofacker
J Felsenstein
J Gorodkin
JD Thompson
JH Havgaard
JJ Cannone
KJ Doshi
M Anwar
M Sprinzl
M Zuker
M Zuker
M Zuker
Marcel Turcotte
P Rice
PP Gardner
PP Gardner
R Gutell
RD Dowell
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background In ribonucleic acid (RNA) molecules whose function depends on their final, folded three-dimensional shape (such as those in ribosomes or spliceosome complexes), the secondary structure, defined by the set of internal basepair interactions, is more consistently conserved than the primary structure, defined by the sequence of nucleotides. Results The research presented here investigates the possibility of applying a progressive, pairwise approach to the alignment of multiple RNA sequences by simultaneously predicting an energy-optimized consensus secondary structure. We take an existing algorithm for finding the secondary structure common to two RNA sequences, Dynalign, and alter it to align profiles of multiple sequences. We then explore the relative successes of different approaches to designing the tree that will guide progressive alignments of sequence profiles to create a multiple alignment and prediction of conserved structure. Conclusion We have found that applying a progressive, pairwise approach to the alignment of multiple ribonucleic acid sequences produces highly reliable predictions of conserved basepairs, and we have shown how these predictions can be used as constraints to improve the results of a single-sequence structure prediction algorithm. However, we have also discovered that the amount of detail included in a consensus structure prediction is highly dependent on the order in which sequences are added to the alignment (the guide tree), and that if a consensus structure does not have sufficient detail, it is less likely to provide useful constraints for the single-sequence method.</p

Accelerated probabilistic inference of RNA structure evolution

Author: Holmes Ian
Publication venue: BioMed Central
Publication date: 01/01/2005
Field of study

BACKGROUND: Pairwise stochastic context-free grammars (Pair SCFGs) are powerful tools for evolutionary analysis of RNA, including simultaneous RNA sequence alignment and secondary structure prediction, but the associated algorithms are intensive in both CPU and memory usage. The same problem is faced by other RNA alignment-and-folding algorithms based on Sankoff's 1985 algorithm. It is therefore desirable to constrain such algorithms, by pre-processing the sequences and using this first pass to limit the range of structures and/or alignments that can be considered. RESULTS: We demonstrate how flexible classes of constraint can be imposed, greatly reducing the computational costs while maintaining a high quality of structural homology prediction. Any score-attributed context-free grammar (e.g. energy-based scoring schemes, or conditionally normalized Pair SCFGs) is amenable to this treatment. It is now possible to combine independent structural and alignment constraints of unprecedented general flexibility in Pair SCFG alignment algorithms. We outline several applications to the bioinformatics of RNA sequence and structure, including Waterman-Eggert N-best alignments and progressive multiple alignment. We evaluate the performance of the algorithm on test examples from the RFAM database. CONCLUSION: A program, Stemloc, that implements these algorithms for efficient RNA sequence alignment and structure prediction is available under the GNU General Public License

eScholarship - University of California

CentroidFold: a web server for RNA secondary structure prediction

Author: Ding
DING
Do
Dowell
Hofacker
K. Asai
K. Sato
Knudsen
M. Hamada
McCaskill
T. Mituyama
Zuker
Publication venue: Oxford University Press
Publication date
Field of study

The CentroidFold web server (http://www.ncrna.org/centroidfold/) is a web application for RNA secondary structure prediction powered by one of the most accurate prediction engine. The server accepts two kinds of sequence data: a single RNA sequence and a multiple alignment of RNA sequences. It responses with a prediction result shown as a popular base-pair notation and a graph representation. PDF version of the graph representation is also available. For a multiple alignment sequence, the server predicts a common secondary structure. Usage of the server is quite simple. You can paste a single RNA sequence (FASTA or plain sequence text) or a multiple alignment (CLUSTAL-W format) into the textarea then click on the ‘execute CentroidFold’ button. The server quickly responses with a prediction result. The major advantage of this server is that it employs our original CentroidFold software as its prediction engine which scores the best accuracy in our benchmark results. Our web server is freely available with no login requirement

Digital Commons @ New Jersey Institute of Technology (NJIT)

RNA structure analysis : algorithms and applications

Author: Liu Jianghui
Publication venue: Digital Commons @ NJIT
Publication date: 31/08/2005
Field of study

In this doctoral thesis, efficient algorithms for aligning RNA secondary structures and mining unknown RNA motifs are presented. As the major contribution, a structure alignment algorithm, which combines both primary and secondary structure information, can find the optimal alignment between two given structures where one of them could be either a pattern structure of a known motif or a real query structure and the other be a subject structure. Motivated by widely used algorithms for RNA folding, the proposed algorithm decomposes an RNA secondary structure into a set of atomic structural components that can be further organized in a tree model to capture the structural particularities. The novel structure alignment algorithm is implemented using dynamic programming techniques coupled by position-independent scoring matrices. The algorithm can find the optimal global and local alignments between two RNA secondary structures at quadratic time complexity. When applied to searching a structure database, the algorithm can find similar RNA substructures and therefore can be used to identify functional RNA motifs. Extension of the algorithm has also been accomplished to deal with position-dependent scoring matrix in the purpose of aligning multiple structures. All algorithms have been implemented in a package under the name RSmatch and applied to searching mRNA UTR structure database and mining RNA motifs. The experimental results showed high efficiency and effectiveness of the proposed techniques

Improved accuracy of multiple ncRNA alignment by incorporating structural information into a MAFFT-based framework

Author: A Wilm
B Knudsen
BW Matthews
C Notredame
C Notredame
C Notredame
CB Do
CB Do
D Dalli
DF Feng
DH Mathews
E Torarinsson
GJ Barton
H Kiryu
H Kiryu
H Zhou
Hiroyuki Toh
I Holmes
IL Hofacker
J Heringa
J Pei
J Reeder
JD Thompson
JD Thompson
JH Havgaard
JS McCaskill
JS Papadopoulos
K Katoh
K Katoh
Kazutaka Katoh
M Bauer
M Hochsmann
M Kruspe
M Vingron
MA Larkin
MM Yusupov
O Gotoh
O Gotoh
PP Gardner
PP Gardner
S Griffiths-Jones
S Lindgreen
S Washietl
SB Needleman
SR Eddy
TF Smith
VA Simossis
X Xu
Y Tabei
Y Tabei
YM Hou
Z Yao
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background Structural alignment of RNAs is becoming important, since the discovery of functional non-coding RNAs (ncRNAs). Recent studies, mainly based on various approximations of the Sankoff algorithm, have resulted in considerable improvement in the accuracy of pairwise structural alignment. In contrast, for the cases with more than two sequences, the practical merit of structural alignment remains unclear as compared to traditional sequence-based methods, although the importance of multiple structural alignment is widely recognized. Results We took a different approach from a straightforward extension of the Sankoff algorithm to the multiple alignments from the viewpoints of accuracy and time complexity. As a new option of the MAFFT alignment program, we developed a multiple RNA alignment framework, X-INS-i, which builds a multiple alignment with an iterative method incorporating structural information through two components: (1) pairwise structural alignments by an external pairwise alignment method such as SCARNA or LaRA and (2) a new objective function, Four-way Consistency, derived from the base-pairing probability of every sub-aligned group at every multiple alignment stage. Conclusion The BRAliBASE benchmark showed that X-INS-i outperforms other methods currently available in the sum-of-pairs score (SPS) criterion. As a basis for predicting common secondary structure, the accuracy of the present method is comparable to or rather higher than those of the current leading methods such as RNA Sampler. The X-INS-i framework can be used for building a multiple RNA alignment from any combination of algorithms for pairwise RNA alignment and base-pairing probability. The source code is available at the webpage found in the Availability and requirements section.</p