322 research outputs found
RNA secondary structure prediction from multi-aligned sequences
It has been well accepted that the RNA secondary structures of most
functional non-coding RNAs (ncRNAs) are closely related to their functions and
are conserved during evolution. Hence, prediction of conserved secondary
structures from evolutionarily related sequences is one important task in RNA
bioinformatics; the methods are useful not only to further functional analyses
of ncRNAs but also to improve the accuracy of secondary structure predictions
and to find novel functional RNAs from the genome. In this review, I focus on
common secondary structure prediction from a given aligned RNA sequence, in
which one secondary structure whose length is equal to that of the input
alignment is predicted. I systematically review and classify existing tools and
algorithms for the problem, by utilizing the information employed in the tools
and by adopting a unified viewpoint based on maximum expected gain (MEG)
estimators. I believe that this classification will allow a deeper
understanding of each tool and provide users with useful information for
selecting tools for common secondary structure predictions.Comment: A preprint of an invited review manuscript that will be published in
a chapter of the book `Methods in Molecular Biology'. Note that this version
of the manuscript may differ from the published versio
McGenus: A Monte Carlo algorithm to predict RNA secondary structures with pseudoknots
We present McGenus, an algorithm to predict RNA secondary structures with
pseudoknots. The method is based on a classification of RNA structures
according to their topological genus. McGenus can treat sequences of up to 1000
bases and performs an advanced stochastic search of their minimum free energy
structure allowing for non trivial pseudoknot topologies. Specifically, McGenus
employs a multiple Markov chain scheme for minimizing a general scoring
function which includes not only free energy contributions for pair stacking,
loop penalties, etc. but also a phenomenological penalty for the genus of the
pairing graph. The good performance of the stochastic search strategy was
successfully validated against TT2NE which uses the same free energy
parametrization and performs exhaustive or partially exhaustive structure
search, albeit for much shorter sequences (up to 200 bases). Next, the method
was applied to other RNA sets, including an extensive tmRNA database, yielding
results that are competitive with existing algorithms. Finally, it is shown
that McGenus highlights possible limitations in the free energy scoring
function. The algorithm is available as a web-server at
http://ipht.cea.fr/rna/mcgenus.php .Comment: 6 pages, 1 figur
Ab initio RNA folding
RNA molecules are essential cellular machines performing a wide variety of
functions for which a specific three-dimensional structure is required. Over
the last several years, experimental determination of RNA structures through
X-ray crystallography and NMR seems to have reached a plateau in the number of
structures resolved each year, but as more and more RNA sequences are being
discovered, need for structure prediction tools to complement experimental data
is strong. Theoretical approaches to RNA folding have been developed since the
late nineties when the first algorithms for secondary structure prediction
appeared. Over the last 10 years a number of prediction methods for 3D
structures have been developed, first based on bioinformatics and data-mining,
and more recently based on a coarse-grained physical representation of the
systems. In this review we are going to present the challenges of RNA structure
prediction and the main ideas behind bioinformatic approaches and physics-based
approaches. We will focus on the description of the more recent physics-based
phenomenological models and on how they are built to include the specificity of
the interactions of RNA bases, whose role is critical in folding. Through
examples from different models, we will point out the strengths of
physics-based approaches, which are able not only to predict equilibrium
structures, but also to investigate dynamical and thermodynamical behavior, and
the open challenges to include more key interactions ruling RNA folding.Comment: 28 pages, 18 figure
On the combinatorics of sparsification
Background: We study the sparsification of dynamic programming folding
algorithms of RNA structures. Sparsification applies to the mfe-folding of RNA
structures and can lead to a significant reduction of time complexity. Results:
We analyze the sparsification of a particular decomposition rule, ,
that splits an interval for RNA secondary and pseudoknot structures of fixed
topological genus. Essential for quantifying the sparsification is the size of
its so called candidate set. We present a combinatorial framework which allows
by means of probabilities of irreducible substructures to obtain the expected
size of the set of -candidates. We compute these expectations for
arc-based energy models via energy-filtered generating functions (GF) for RNA
secondary structures as well as RNA pseudoknot structures. For RNA secondary
structures we also consider a simplified loop-energy model. This combinatorial
analysis is then compared to the expected number of -candidates
obtained from folding mfe-structures. In case of the mfe-folding of RNA
secondary structures with a simplified loop energy model our results imply that
sparsification provides a reduction of time complexity by a constant factor of
91% (theory) versus a 96% reduction (experiment). For the "full" loop-energy
model there is a reduction of 98% (experiment).Comment: 27 pages, 12 figure
Graphical methods in RNA structure matching
Eukaryotic genomes are pervasively transcribed; almost every base can be found in an RNA transcript. This is a surprising observation since most of the genome does not encode proteins. This RNA must serve an important regulatory function – important because producing non-coding RNA is an energy intensive process, and in the absence of strong selection one would expect it to disappear.
RNA families with common functions have specifically conserved structural motifs, which are directly related to the functional roles of RNA in catalysis and regulation. Because the conserved structures depend on base-pairing, similar RNA structures may have little or no detectable sequence similarity, making the identification of conserved RNAs difficult. This is a particularly serious problem when studying regulatory structures in RNA. In many cases, such as that of cellular internal ribosome entry sites, although we can identify RNAs that have similar regulatory responses, it is difficult to tell whether the RNAs have common structural features using current methods. Available tools for identifying common structures based on RNA sequence suffer from one or more of the following problems: they do not consider pseudoknots, which are important in many catalytic and regulatory structures; they do not consider near minimum free energy structures, which is important as many RNAs exist as an ensemble of structures of nearly equal energy; they require many examples of known structures in order to train a computational model; they require impractical amounts of computational time, precluding their use on long sequences or genomic scale; or they use a similarity function that cannot identify RNAs as having similar structure, even when they are from one of the well characterized known classes. The approach presented here has the potential to address all of these issues, allowing novel RNA structures that are shared between RNAs with little or no sequence similarity to be discovered. This provides a powerful tool to investigate and explain the pervasive transcription observed in eukaryotic genomes
LaRA 2: parallel and vectorized program for sequence–structure alignment of RNA sequences
Background
The function of non-coding RNA sequences is largely determined by their spatial conformation, namely the secondary structure of the molecule, formed by Watson–Crick interactions between nucleotides. Hence, modern RNA alignment algorithms routinely take structural information into account. In order to discover yet unknown RNA families and infer their possible functions, the structural alignment of RNAs is an essential task. This task demands a lot of computational resources, especially for aligning many long sequences, and it therefore requires efficient algorithms that utilize modern hardware when available. A subset of the secondary structures contains overlapping interactions (called pseudoknots), which add additional complexity to the problem and are often ignored in available software.
Results
We present the SeqAn-based software LaRA 2 that is significantly faster than comparable software for accurate pairwise and multiple alignments of structured RNA sequences. In contrast to other programs our approach can handle arbitrary pseudoknots. As an improved re-implementation of the LaRA tool for structural alignments, LaRA 2 uses multi-threading and vectorization for parallel execution and a new heuristic for computing a lower boundary of the solution. Our algorithmic improvements yield a program that is up to 130 times faster than the previous version.
Conclusions
With LaRA 2 we provide a tool to analyse large sets of RNA secondary structures in relatively short time, based on structural alignment. The produced alignments can be used to derive structural motifs for the search in genomic databases
Thermodynamics of RNA structures by Wang–Landau sampling
Motivation: Thermodynamics-based dynamic programming RNA secondary structure algorithms have been of immense importance in molecular biology, where applications range from the detection of novel selenoproteins using expressed sequence tag (EST) data, to the determination of microRNA genes and their targets. Dynamic programming algorithms have been developed to compute the minimum free energy secondary structure and partition function of a given RNA sequence, the minimum free-energy and partition function for the hybridization of two RNA molecules, etc. However, the applicability of dynamic programming methods depends on disallowing certain types of interactions (pseudoknots, zig-zags, etc.), as their inclusion renders structure prediction an nondeterministic polynomial time (NP)-complete problem. Nevertheless, such interactions have been observed in X-ray structures
- …