Search CORE

12,048 research outputs found

Tfold: efficient in silico prediction of non-coding RNA secondary structures

Author: Engelen Stéfan
Tahi Fariza
Publication venue: Oxford University Press
Publication date: 01/04/2010
Field of study

Predicting RNA secondary structures is a very important task, and continues to be a challenging problem, even though several methods and algorithms are proposed in the literature. In this article, we propose an algorithm called Tfold, for predicting non-coding RNA secondary structures. Tfold takes as input a RNA sequence for which the secondary structure is searched and a set of aligned homologous sequences. It combines criteria of stability, conservation and covariation in order to search for stems and pseudoknots (whatever their type). Stems are searched recursively, from the most to the least stable. Tfold uses an algorithm called SSCA for selecting the most appropriate sequences from a large set of homologous sequences (taken from a database for example) to use for the prediction. Tfold can take into account one or several stems considered by the user as belonging to the secondary structure. Tfold can return several structures (if requested by the user) when ‘rival’ stems are found. Tfold has a complexity of O(n2), with n the sequence length. The developed software, which offers several different uses, is available on the web site: http://tfold.ibisc.univ-evry.fr/TFold

HAL Evry

PubMed Central

HAL-CEA

An Efficient Algorithm for Upper Bound on the Partition Function of Nucleic Acids

Author: Chitsaz Hamidreza
Forouzmand Elmirasadat
Haffari Gholamreza
Publication venue
Publication date: 01/01/2013
Field of study

It has been shown that minimum free energy structure for RNAs and RNA-RNA interaction is often incorrect due to inaccuracies in the energy parameters and inherent limitations of the energy model. In contrast, ensemble based quantities such as melting temperature and equilibrium concentrations can be more reliably predicted. Even structure prediction by sampling from the ensemble and clustering those structures by Sfold [7] has proven to be more reliable than minimum free energy structure prediction. The main obstacle for ensemble based approaches is the computational complexity of the partition function and base pairing probabilities. For instance, the space complexity of the partition function for RNA-RNA interaction is

O(n^4)

and the time complexity is

O(n^6)

which are prohibitively large [4,12]. Our goal in this paper is to give a fast algorithm, based on sparse folding, to calculate an upper bound on the partition function. Our work is based on the recent algorithm of Hazan and Jaakkola [10]. The space complexity of our algorithm is the same as that of sparse folding algorithms, and the time complexity of our algorithm is

O(MFE(n)\ell)

for single RNA and

O(MFE(m, n)\ell)

for RNA-RNA interaction in practice, in which

MFE

is the running time of sparse folding and

\ell \leq n

(

\ell \leq n + m

) is a sequence dependent parameter

arXiv.org e-Print Archive

CiteSeerX

Prediction of secondary structures for large RNA molecules

Author: Mathuriya Amrita
Publication venue: Georgia Institute of Technology
Publication date: 12/01/2009
Field of study

The prediction of correct secondary structures of large RNAs is one of the unsolved challenges of computational molecular biology. Among the major obstacles is the fact that accurate calculations scale as O(n⁴), so the computational requirements become prohibitive as the length increases. We present a new parallel multicore and scalable program called GTfold, which is one to two orders of magnitude faster than the de facto standard programs mfold and RNAfold for folding large RNA viral sequences and achieves comparable accuracy of prediction. We analyze the algorithm's concurrency and describe the parallelism for a shared memory environment such as a symmetric multiprocessor or multicore chip. We are seeing a paradigm shift to multicore chips and parallelism must be explicitly addressed to continue gaining performance with each new generation of systems. We provide a rigorous proof of correctness of an optimized algorithm for internal loop calculations called internal loop speedup algorithm (ILSA), which reduces the time complexity of internal loop computations from O(n⁴) to O(n³) and show that the exact algorithms such as ILSA are executed with our method in affordable amount of time. The proof gives insight into solving these kinds of combinatorial problems. We have documented detailed pseudocode of the algorithm for predicting minimum free energy secondary structures which provides a base to implement future algorithmic improvements and improved thermodynamic model in GTfold. GTfold is written in C/C++ and freely available as open source from our website.M.S.Committee Chair: Bader, David; Committee Co-Chair: Heitsch, Christine; Committee Member: Harvey, Stephen; Committee Member: Vuduc, Richar

Scholarly Materials And Research @ Georgia Tech

On the combinatorics of sparsification

Author: Christian M Reidys
Christian M Reidys
Fenix Wd Huang
Publication venue
Publication date: 01/01/2012
Field of study

Background: We study the sparsification of dynamic programming folding algorithms of RNA structures. Sparsification applies to the mfe-folding of RNA structures and can lead to a significant reduction of time complexity. Results: We analyze the sparsification of a particular decomposition rule,

\Lambda^*

, that splits an interval for RNA secondary and pseudoknot structures of fixed topological genus. Essential for quantifying the sparsification is the size of its so called candidate set. We present a combinatorial framework which allows by means of probabilities of irreducible substructures to obtain the expected size of the set of

\Lambda^*

-candidates. We compute these expectations for arc-based energy models via energy-filtered generating functions (GF) for RNA secondary structures as well as RNA pseudoknot structures. For RNA secondary structures we also consider a simplified loop-energy model. This combinatorial analysis is then compared to the expected number of

\Lambda^*

-candidates obtained from folding mfe-structures. In case of the mfe-folding of RNA secondary structures with a simplified loop energy model our results imply that sparsification provides a reduction of time complexity by a constant factor of 91% (theory) versus a 96% reduction (experiment). For the "full" loop-energy model there is a reduction of 98% (experiment).Comment: 27 pages, 12 figure

arXiv.org e-Print Archive

CiteSeerX

Springer - Publisher Connector

Directory of Open Access Journals

Prediction and statistics of pseudoknots in RNA structures using exactly clustered stochastic simulations

Author: A. Xayaphoummine
Evers
F. Thalmann
Ferr -D'Amar
Giedroc
Gultyaev
H. Isambert
Higgs
Lehnert
Mathews
McCaskill
Mironov
Pan
Pleij
Rivas
Russell
Sclavi
Shirts
T. Bucher
Tinoco
Treiber
Zarrinkar
Zuker
Publication venue: 'Proceedings of the National Academy of Sciences'
Publication date: 01/01/2003
Field of study

Ab initio RNA secondary structure predictions have long dismissed helices interior to loops, so-called pseudoknots, despite their structural importance. Here, we report that many pseudoknots can be predicted through long time scales RNA folding simulations, which follow the stochastic closing and opening of individual RNA helices. The numerical efficacy of these stochastic simulations relies on an O(n^2) clustering algorithm which computes time averages over a continously updated set of n reference structures. Applying this exact stochastic clustering approach, we typically obtain a 5- to 100-fold simulation speed-up for RNA sequences up to 400 bases, while the effective acceleration can be as high as 100,000-fold for short multistable molecules (<150 bases). We performed extensive folding statistics on random and natural RNA sequences, and found that pseudoknots are unevenly distributed amongst RNAstructures and account for up to 30% of base pairs in G+C rich RNA sequences (Online RNA folding kinetics server including pseudoknots : http://kinefold.u-strasbg.fr/ ).Comment: 6 pages, 5 figure

arXiv.org e-Print Archive

Crossref

PubMed Central

CERN Document Server

A Seeded Genetic Algorithm for RNA Secondary Structural Prediction with Pseudoknots

Author: Pham Ryan
Publication venue: SJSU ScholarWorks
Publication date: 01/01/2008
Field of study

This work explores a new approach in using genetic algorithm to predict RNA secondary structures with pseudoknots. Since only a small portion of most RNA structures is comprised of pseudoknots, the majority of structural elements from an optimal pseudoknot-free structure are likely to be part of the true structure. Thus seeding the genetic algorithm with optimal pseudoknot-free structures will more likely lead it to the true structure than a randomly generated population. The genetic algorithm uses the known energy models with an additional augmentation to allow complex pseudoknots. The nearest-neighbor energy model is used in conjunction with Turner’s thermodynamic parameters for pseudoknot-free structures, and the H-type pseudoknot energy estimation for simple pseudoknots. Testing with known pseudoknot sequences from PseudoBase shows that it out performs some of the current popular algorithms

SJSU ScholarWorks