Search CORE

1,477 research outputs found

An Efficient Algorithm for Upper Bound on the Partition Function of Nucleic Acids

Author: Chitsaz Hamidreza
Forouzmand Elmirasadat
Haffari Gholamreza
Publication venue
Publication date: 01/01/2013
Field of study

It has been shown that minimum free energy structure for RNAs and RNA-RNA interaction is often incorrect due to inaccuracies in the energy parameters and inherent limitations of the energy model. In contrast, ensemble based quantities such as melting temperature and equilibrium concentrations can be more reliably predicted. Even structure prediction by sampling from the ensemble and clustering those structures by Sfold [7] has proven to be more reliable than minimum free energy structure prediction. The main obstacle for ensemble based approaches is the computational complexity of the partition function and base pairing probabilities. For instance, the space complexity of the partition function for RNA-RNA interaction is

O(n^4)

and the time complexity is

O(n^6)

which are prohibitively large [4,12]. Our goal in this paper is to give a fast algorithm, based on sparse folding, to calculate an upper bound on the partition function. Our work is based on the recent algorithm of Hazan and Jaakkola [10]. The space complexity of our algorithm is the same as that of sparse folding algorithms, and the time complexity of our algorithm is

O(MFE(n)\ell)

for single RNA and

O(MFE(m, n)\ell)

for RNA-RNA interaction in practice, in which

MFE

is the running time of sparse folding and

\ell \leq n

(

\ell \leq n + m

) is a sequence dependent parameter

arXiv.org e-Print Archive

CiteSeerX

Recommended from our members

Linear-time Algorithms for RNA Folding: Partition Function, Stochastic Sampling and RNA-RNA Interaction

Author: Zhang He
Publication venue: 'Oregon State University'
Publication date
Field of study

RNAs play important roles in the central dogma of molecular biology, and are involved in multiple biology processes such as chromatin modification, transcriptional interference and translation initiation. The functions of RNAs, especially non-coding RNAs, are highly related to its secondary structures, therefore computational methods for RNA structure prediction are of great interest. In this dissertation, we propose linear-time algorithms for RNA folding partition function, stochastic sampling and RNA-RNA interaction, which can efficiently and accurately predict and analyze RNA secondary structure. The partition function-based methods are proposed to compute folding ensembles and estimate structure and base pair probabilities. However, the classical partition function algorithm scales cubically with sequence length, and is therefore a slow calculation for long sequences. We design a linear-time heuristic algorithm, LinearPartition, to approximate the partition function and base pairing probabilities, which is shown to be orders of magnitude faster than classical folding systems such as Vienna RNAfold and CONTRAfold. More interestingly, the resulting base pairing probabilities are even better correlated with the ground truth structures. On the other hand, partition function and the estimated base-pairing probabilities provide compact representations of the exponentially large ensemble, but they cannot provide direct and intuitive descriptions, and cannot be directly used for accessibility prediction. Stochastic sampling algorithm, which samples secondary structures according to their probabilities in the Boltzmann ensemble, is widely used, e.g., for accessibility prediction. However, current sampling algorithms are unnecessarily complicated, repeatedly perform redundant work, and scale cubically with the sequence length. These issues prevent it from being used for full-length viral genomes such as SARS-CoV-2. To alleviate these problems, we ﬁrst propose a hypergraph framework under which the sampling algorithm can be greatly simplified, then present a lazy-saving sampling strategy under this framework of which redundant work is eliminated. Finally, we propose LinearSampling, the ﬁrst end-to-end linear-time stochastic sampling algorithm, which can be used to detect SARS-CoV-2 potential regions of diagnostics and treatment. Many RNAs function through RNA-RNA interactions. two-stand folding, which can directly predict the structures with consideration of RNA-RNA interaction, is also well-desired. Some existing tools, such as RNAhybrid and RNAplex, are not only less informative but also less accurate due to omitting the competing between intermolecular and intramolecular base pairs. Another group of tools such as RNAup focus on predicting the binding region rather than predicting two-strand co-folding structure. Other tools like RNAcofold are too slow due to cubic runtime complexity. To address these issues, we propose LinearCoFold and LinearCoPartition, which is able to predict two-strand folding structure, partition function and base pairing prob-abilities in linear runtime and space. Our new coffolding algorithms are orders of magnitude faster than the baseline RNAcofold, and perform better PPV and sensitivity on the RNA-RNA interaction dataset

ScholarsArchive@OSU

RNA-RNA interaction prediction based on multiple sequence alignments

Author: Li Andrew X.
Marz Manja
Qin Jing
Reidys Christian M.
Publication venue
Publication date: 14/07/2010
Field of study

Many computerized methods for RNA-RNA interaction structure prediction have been developed. Recently,

O(N^6)

time and

O(N^4)

space dynamic programming algorithms have become available that compute the partition function of RNA-RNA interaction complexes. However, few of these methods incorporate the knowledge concerning related sequences, thus relevant evolutionary information is often neglected from the structure determination. Therefore, it is of considerable practical interest to introduce a method taking into consideration both thermodynamic stability and sequence covariation. We present the \emph{a priori} folding algorithm \texttt{ripalign}, whose input consists of two (given) multiple sequence alignments (MSA). \texttt{ripalign} outputs (1) the partition function, (2) base-pairing probabilities, (3) hybrid probabilities and (4) a set of Boltzmann-sampled suboptimal structures consisting of canonical joint structures that are compatible to the alignments. Compared to the single sequence-pair folding algorithm \texttt{rip}, \texttt{ripalign} requires negligible additional memory resource. Furthermore, we incorporate possible structure constraints as input parameters into our algorithm. The algorithm described here is implemented in C as part of the \texttt{rip} package. The supplemental material, source code and input/output files can freely be downloaded from \url{http://www.combinatorics.cn/cbpc/ripalign.html}. \section{Contact} Christian Reidys \texttt{[email protected]}Comment: 8 pages, 9 figure

arXiv.org e-Print Archive

LinearCoFold and LinearCoPartition: Linear-Time Algorithms for Secondary Structure Prediction of Interacting RNA molecules

Author: Huang Liang
Li Sizhen
Mathews David H.
Zhang He
Zhang Liang
Publication venue
Publication date: 26/10/2022
Field of study

Many ncRNAs function through RNA-RNA interactions. Fast and reliable RNA structure prediction with consideration of RNA-RNA interaction is useful. Some existing tools are less accurate due to omitting the competing of intermolecular and intramolecular base pairs, or focus more on predicting the binding region rather than predicting the complete secondary structure of two interacting strands. Vienna RNAcofold, which reduces the problem into the classical single sequence folding by concatenating two strands, scales in cubic time against the combined sequence length, and is slow for long sequences. To address these issues, we present LinearCoFold, which predicts the complete minimum free energy structure of two strands in linear runtime, and LinearCoPartition, which calculates the cofolding partition function and base pairing probabilities in linear runtime. LinearCoFold and LinearCoPartition follows the concatenation strategy of RNAcofold, but are orders of magnitude faster than RNAcofold. For example, on a sequence pair with combined length of 26,190 nt, LinearCoFold is 86.8x faster than RNAcofold MFE mode (0.6 minutes vs. 52.1 minutes), and LinearCoPartition is 642.3x faster than RNAcofold partition function mode (1.8 minutes vs. 1156.2 minutes). Different from the local algorithms, LinearCoFold and LinearCoPartition are global cofolding algorithms without restriction on base pair length. Surprisingly, LinearCoFold and LinearCoPartition's predictions have higher PPV and sensitivity of intermolecular base pairs. Furthermore, we apply LinearCoFold to predict the RNA-RNA interaction between SARS-CoV-2 gRNA and human U4 snRNA, which has been experimentally studied, and observe that LinearCoFold's prediction correlates better to the wet lab results

arXiv.org e-Print Archive

Partition function and base pairing probabilities of RNA heterodimers

Author: Bernhart Stephan H.
Flamm Christoph
Hofacker Ivo L.
Mückstein Ulrike
Stadler Peter F.
Tafer Hakim
Publication venue
Publication date: 07/11/2018
Field of study

Background: RNA has been recognized as a key player in cellular regulation in recent years. In many cases, non-coding RNAs exert their function by binding to other nucleic acids, as in the case of microRNAs and snoRNAs. The specificity of these interactions derives from the stability of inter-molecular base pairing. The accurate computational treatment of RNA-RNA binding therefore lies at the heart of target prediction algorithms. Methods: The standard dynamic programming algorithms for computing secondary structures of linear single-stranded RNA molecules are extended to the co-folding of two interacting RNAs. Results: We present a program, RNAcofold, that computes the hybridization energy and base pairing pattern of a pair of interacting RNA molecules. In contrast to earlier approaches, complex internal structures in both RNAs are fully taken into account. RNAcofold supports the calculation of the minimum energy structure and of a complete set of suboptimal structures in an energy band above the ground state. Furthermore, it provides an extension of McCaskill's partition function algorithm to compute base pairing probabilities, realistic interaction energies, and equilibrium concentrations of duplex structures

Qucosa - Publikationsserver der Universität Leipzig

Target prediction and a statistical sampling algorithm for RNA-RNA interaction

Author: Akutsu
Alkan
Andronescu
Argaman
Bachellerie
Banerjee
Benne
Bernhart
Busch
Chitsaz
Chitsaz
Christian M. Reidys
Ding
Dirks
Dowell
Fenix W. D. Huang
Geissmann
Giegerich
Hekimoglu
Hofacker
Huang
Jing Qin
Kugel
McCaskill
McManus
Mneimneh
Mückstein
Mückstein
Narberhaus
Pervouchine
Peter F. Stadler
Qin
Rehmsmeier
Rivas
Salari
Tacker
Tafer
Tjaden
Udekwu
Urban
Zuker
Publication venue
Publication date: 05/08/2009
Field of study

It has been proven that the accessibility of the target sites has a critical influence for miRNA and siRNA. In this paper, we present a program, rip2.0, not only the energetically most favorable targets site based on the hybrid-probability, but also a statistical sampling structure to illustrate the statistical characterization and representation of the Boltzmann ensemble of RNA-RNA interaction structures. The outputs are retrieved via backtracing an improved dynamic programming solution for the partition function based on the approach of Huang et al. (Bioinformatics). The

O(N^6)

time and

O(N^4)

space algorithm is implemented in C (available from \url{http://www.combinatorics.cn/cbpc/rip2.html})Comment: 7 pages, 10 figure

arXiv.org e-Print Archive

Permanent Hosting, Archiving and Indexing of Digital Resources and Assets