12,048 research outputs found
Tfold: efficient in silico prediction of non-coding RNA secondary structures
Predicting RNA secondary structures is a very important task, and continues to be a challenging problem, even though several methods and algorithms are proposed in the literature. In this article, we propose an algorithm called Tfold, for predicting non-coding RNA secondary structures. Tfold takes as input a RNA sequence for which the secondary structure is searched and a set of aligned homologous sequences. It combines criteria of stability, conservation and covariation in order to search for stems and pseudoknots (whatever their type). Stems are searched recursively, from the most to the least stable. Tfold uses an algorithm called SSCA for selecting the most appropriate sequences from a large set of homologous sequences (taken from a database for example) to use for the prediction. Tfold can take into account one or several stems considered by the user as belonging to the secondary structure. Tfold can return several structures (if requested by the user) when ârivalâ stems are found. Tfold has a complexity of O(n2), with n the sequence length. The developed software, which offers several different uses, is available on the web site: http://tfold.ibisc.univ-evry.fr/TFold
An Efficient Algorithm for Upper Bound on the Partition Function of Nucleic Acids
It has been shown that minimum free energy structure for RNAs and RNA-RNA
interaction is often incorrect due to inaccuracies in the energy parameters and
inherent limitations of the energy model. In contrast, ensemble based
quantities such as melting temperature and equilibrium concentrations can be
more reliably predicted. Even structure prediction by sampling from the
ensemble and clustering those structures by Sfold [7] has proven to be more
reliable than minimum free energy structure prediction. The main obstacle for
ensemble based approaches is the computational complexity of the partition
function and base pairing probabilities. For instance, the space complexity of
the partition function for RNA-RNA interaction is and the time
complexity is which are prohibitively large [4,12]. Our goal in this
paper is to give a fast algorithm, based on sparse folding, to calculate an
upper bound on the partition function. Our work is based on the recent
algorithm of Hazan and Jaakkola [10]. The space complexity of our algorithm is
the same as that of sparse folding algorithms, and the time complexity of our
algorithm is for single RNA and for RNA-RNA
interaction in practice, in which is the running time of sparse folding
and () is a sequence dependent parameter
Prediction of secondary structures for large RNA molecules
The prediction of correct secondary structures of large RNAs is one of the unsolved challenges of computational molecular biology. Among the major obstacles is the fact that accurate calculations scale as O(nâŽ), so the computational requirements become prohibitive as the length increases. We present a new parallel multicore and scalable program called GTfold, which is one to two orders of magnitude faster than the de facto standard programs mfold and RNAfold for folding large RNA viral sequences and achieves comparable accuracy of prediction. We analyze the algorithm's concurrency and describe the parallelism for a shared memory environment such as a symmetric multiprocessor or multicore chip. We are seeing a paradigm shift to multicore chips and parallelism must be explicitly addressed to continue gaining performance with each new generation of systems.
We provide a rigorous proof of correctness of an optimized algorithm for internal loop calculations called internal loop speedup algorithm (ILSA), which reduces the time complexity of internal loop computations from O(nâŽ) to O(nÂł) and show that the exact algorithms such as ILSA are executed with our method in affordable amount of time. The proof gives insight into solving these kinds of combinatorial problems. We have documented detailed pseudocode of the algorithm for predicting minimum free energy secondary structures which provides a base to implement future algorithmic improvements and improved thermodynamic model in GTfold. GTfold is written in C/C++ and freely available as open source from our website.M.S.Committee Chair: Bader, David; Committee Co-Chair: Heitsch, Christine; Committee Member: Harvey, Stephen; Committee Member: Vuduc, Richar
On the combinatorics of sparsification
Background: We study the sparsification of dynamic programming folding
algorithms of RNA structures. Sparsification applies to the mfe-folding of RNA
structures and can lead to a significant reduction of time complexity. Results:
We analyze the sparsification of a particular decomposition rule, ,
that splits an interval for RNA secondary and pseudoknot structures of fixed
topological genus. Essential for quantifying the sparsification is the size of
its so called candidate set. We present a combinatorial framework which allows
by means of probabilities of irreducible substructures to obtain the expected
size of the set of -candidates. We compute these expectations for
arc-based energy models via energy-filtered generating functions (GF) for RNA
secondary structures as well as RNA pseudoknot structures. For RNA secondary
structures we also consider a simplified loop-energy model. This combinatorial
analysis is then compared to the expected number of -candidates
obtained from folding mfe-structures. In case of the mfe-folding of RNA
secondary structures with a simplified loop energy model our results imply that
sparsification provides a reduction of time complexity by a constant factor of
91% (theory) versus a 96% reduction (experiment). For the "full" loop-energy
model there is a reduction of 98% (experiment).Comment: 27 pages, 12 figure
Prediction and statistics of pseudoknots in RNA structures using exactly clustered stochastic simulations
Ab initio RNA secondary structure predictions have long dismissed helices
interior to loops, so-called pseudoknots, despite their structural importance.
Here, we report that many pseudoknots can be predicted through long time scales
RNA folding simulations, which follow the stochastic closing and opening of
individual RNA helices. The numerical efficacy of these stochastic simulations
relies on an O(n^2) clustering algorithm which computes time averages over a
continously updated set of n reference structures. Applying this exact
stochastic clustering approach, we typically obtain a 5- to 100-fold simulation
speed-up for RNA sequences up to 400 bases, while the effective acceleration
can be as high as 100,000-fold for short multistable molecules (<150 bases). We
performed extensive folding statistics on random and natural RNA sequences, and
found that pseudoknots are unevenly distributed amongst RNAstructures and
account for up to 30% of base pairs in G+C rich RNA sequences (Online RNA
folding kinetics server including pseudoknots : http://kinefold.u-strasbg.fr/
).Comment: 6 pages, 5 figure
A Seeded Genetic Algorithm for RNA Secondary Structural Prediction with Pseudoknots
This work explores a new approach in using genetic algorithm to predict RNA secondary structures with pseudoknots. Since only a small portion of most RNA structures is comprised of pseudoknots, the majority of structural elements from an optimal pseudoknot-free structure are likely to be part of the true structure. Thus seeding the genetic algorithm with optimal pseudoknot-free structures will more likely lead it to the true structure than a randomly generated population. The genetic algorithm uses the known energy models with an additional augmentation to allow complex pseudoknots. The nearest-neighbor energy model is used in conjunction with Turnerâs thermodynamic parameters for pseudoknot-free structures, and the H-type pseudoknot energy estimation for simple pseudoknots. Testing with known pseudoknot sequences from PseudoBase shows that it out performs some of the current popular algorithms
- âŠ