1,477 research outputs found
An Efficient Algorithm for Upper Bound on the Partition Function of Nucleic Acids
It has been shown that minimum free energy structure for RNAs and RNA-RNA
interaction is often incorrect due to inaccuracies in the energy parameters and
inherent limitations of the energy model. In contrast, ensemble based
quantities such as melting temperature and equilibrium concentrations can be
more reliably predicted. Even structure prediction by sampling from the
ensemble and clustering those structures by Sfold [7] has proven to be more
reliable than minimum free energy structure prediction. The main obstacle for
ensemble based approaches is the computational complexity of the partition
function and base pairing probabilities. For instance, the space complexity of
the partition function for RNA-RNA interaction is and the time
complexity is which are prohibitively large [4,12]. Our goal in this
paper is to give a fast algorithm, based on sparse folding, to calculate an
upper bound on the partition function. Our work is based on the recent
algorithm of Hazan and Jaakkola [10]. The space complexity of our algorithm is
the same as that of sparse folding algorithms, and the time complexity of our
algorithm is for single RNA and for RNA-RNA
interaction in practice, in which is the running time of sparse folding
and () is a sequence dependent parameter
Recommended from our members
Linear-time Algorithms for RNA Folding: Partition Function, Stochastic Sampling and RNA-RNA Interaction
RNAs play important roles in the central dogma of molecular biology, and are involved in multiple biology processes such as chromatin modification, transcriptional interference and translation initiation. The functions of RNAs, especially non-coding RNAs, are highly related to its secondary structures, therefore computational methods for RNA structure prediction are of great interest. In this dissertation, we propose linear-time algorithms for RNA folding partition function, stochastic sampling and RNA-RNA interaction, which can efficiently and accurately predict and analyze RNA secondary structure. The partition function-based methods are proposed to compute folding ensembles and estimate structure and base pair probabilities. However, the classical partition function algorithm scales cubically with sequence length, and is therefore a slow calculation for long sequences. We design a linear-time heuristic algorithm, LinearPartition, to approximate the partition function and base pairing probabilities, which is shown to be orders of magnitude faster than classical folding systems such as Vienna RNAfold and CONTRAfold. More interestingly, the resulting base pairing probabilities are even better correlated with the ground truth structures. On the other hand, partition function and the estimated base-pairing probabilities provide compact representations of the exponentially large ensemble, but they cannot provide direct and intuitive descriptions, and cannot be directly used for accessibility prediction. Stochastic sampling algorithm, which samples secondary structures according to their probabilities in the Boltzmann ensemble, is widely used, e.g., for accessibility prediction. However, current sampling algorithms are unnecessarily complicated, repeatedly perform redundant work, and scale cubically with the sequence length. These issues prevent it from being used for full-length viral genomes such as SARS-CoV-2. To alleviate these problems, we ïŹrst propose a hypergraph framework under which the sampling algorithm can be greatly simplified, then present a lazy-saving sampling strategy under this framework of which redundant work is eliminated. Finally, we propose LinearSampling, the ïŹrst end-to-end linear-time stochastic sampling algorithm, which can be used to detect SARS-CoV-2 potential regions of diagnostics and treatment. Many RNAs function through RNA-RNA interactions. two-stand folding, which can directly predict the structures with consideration of RNA-RNA interaction, is also well-desired. Some existing tools, such as RNAhybrid and RNAplex, are not only less informative but also less accurate due to omitting the competing between intermolecular and intramolecular base pairs. Another group of tools such as RNAup focus on predicting the binding region rather than predicting two-strand co-folding structure. Other tools like RNAcofold are too slow due to cubic runtime complexity. To address these issues, we propose LinearCoFold and LinearCoPartition, which is able to predict two-strand folding structure, partition function and base pairing prob-abilities in linear runtime and space. Our new coffolding algorithms are orders of magnitude faster than the baseline RNAcofold, and perform better PPV and sensitivity on the RNA-RNA interaction dataset
RNA-RNA interaction prediction based on multiple sequence alignments
Many computerized methods for RNA-RNA interaction structure prediction have
been developed. Recently, time and space dynamic programming
algorithms have become available that compute the partition function of RNA-RNA
interaction complexes. However, few of these methods incorporate the knowledge
concerning related sequences, thus relevant evolutionary information is often
neglected from the structure determination. Therefore, it is of considerable
practical interest to introduce a method taking into consideration both
thermodynamic stability and sequence covariation. We present the \emph{a
priori} folding algorithm \texttt{ripalign}, whose input consists of two
(given) multiple sequence alignments (MSA). \texttt{ripalign} outputs (1) the
partition function, (2) base-pairing probabilities, (3) hybrid probabilities
and (4) a set of Boltzmann-sampled suboptimal structures consisting of
canonical joint structures that are compatible to the alignments. Compared to
the single sequence-pair folding algorithm \texttt{rip}, \texttt{ripalign}
requires negligible additional memory resource. Furthermore, we incorporate
possible structure constraints as input parameters into our algorithm. The
algorithm described here is implemented in C as part of the \texttt{rip}
package. The supplemental material, source code and input/output files can
freely be downloaded from \url{http://www.combinatorics.cn/cbpc/ripalign.html}.
\section{Contact} Christian Reidys \texttt{[email protected]}Comment: 8 pages, 9 figure
LinearCoFold and LinearCoPartition: Linear-Time Algorithms for Secondary Structure Prediction of Interacting RNA molecules
Many ncRNAs function through RNA-RNA interactions. Fast and reliable RNA
structure prediction with consideration of RNA-RNA interaction is useful. Some
existing tools are less accurate due to omitting the competing of
intermolecular and intramolecular base pairs, or focus more on predicting the
binding region rather than predicting the complete secondary structure of two
interacting strands. Vienna RNAcofold, which reduces the problem into the
classical single sequence folding by concatenating two strands, scales in cubic
time against the combined sequence length, and is slow for long sequences. To
address these issues, we present LinearCoFold, which predicts the complete
minimum free energy structure of two strands in linear runtime, and
LinearCoPartition, which calculates the cofolding partition function and base
pairing probabilities in linear runtime. LinearCoFold and LinearCoPartition
follows the concatenation strategy of RNAcofold, but are orders of magnitude
faster than RNAcofold. For example, on a sequence pair with combined length of
26,190 nt, LinearCoFold is 86.8x faster than RNAcofold MFE mode (0.6 minutes
vs. 52.1 minutes), and LinearCoPartition is 642.3x faster than RNAcofold
partition function mode (1.8 minutes vs. 1156.2 minutes). Different from the
local algorithms, LinearCoFold and LinearCoPartition are global cofolding
algorithms without restriction on base pair length. Surprisingly, LinearCoFold
and LinearCoPartition's predictions have higher PPV and sensitivity of
intermolecular base pairs. Furthermore, we apply LinearCoFold to predict the
RNA-RNA interaction between SARS-CoV-2 gRNA and human U4 snRNA, which has been
experimentally studied, and observe that LinearCoFold's prediction correlates
better to the wet lab results
Partition function and base pairing probabilities of RNA heterodimers
Background: RNA has been recognized as a key player in cellular regulation in recent years. In many cases, non-coding RNAs exert their function by binding to other nucleic acids, as in the case of microRNAs and snoRNAs. The specificity of these interactions derives from the stability of inter-molecular base pairing. The accurate computational treatment of RNA-RNA binding therefore lies at the heart of target prediction algorithms.
Methods: The standard dynamic programming algorithms for computing secondary structures of linear single-stranded RNA molecules are extended to the co-folding of two interacting RNAs.
Results: We present a program, RNAcofold, that computes the hybridization energy and base pairing pattern of a pair of interacting RNA molecules. In contrast to earlier approaches, complex internal structures in both RNAs are fully taken into account. RNAcofold supports the calculation of the minimum energy structure and of a complete set of suboptimal structures in an energy band above the ground state. Furthermore, it provides an extension of McCaskill's partition function algorithm to compute base pairing probabilities, realistic interaction energies, and equilibrium concentrations of duplex structures
Target prediction and a statistical sampling algorithm for RNA-RNA interaction
It has been proven that the accessibility of the target sites has a critical
influence for miRNA and siRNA. In this paper, we present a program, rip2.0, not
only the energetically most favorable targets site based on the
hybrid-probability, but also a statistical sampling structure to illustrate the
statistical characterization and representation of the Boltzmann ensemble of
RNA-RNA interaction structures. The outputs are retrieved via backtracing an
improved dynamic programming solution for the partition function based on the
approach of Huang et al. (Bioinformatics). The time and space
algorithm is implemented in C (available from
\url{http://www.combinatorics.cn/cbpc/rip2.html})Comment: 7 pages, 10 figure
- âŠ