191 research outputs found
Partition function and base pairing probabilities of RNA heterodimers
Background: RNA has been recognized as a key player in cellular regulation in recent years. In many cases, non-coding RNAs exert their function by binding to other nucleic acids, as in the case of microRNAs and snoRNAs. The specificity of these interactions derives from the stability of inter-molecular base pairing. The accurate computational treatment of RNA-RNA binding therefore lies at the heart of target prediction algorithms.
Methods: The standard dynamic programming algorithms for computing secondary structures of linear single-stranded RNA molecules are extended to the co-folding of two interacting RNAs.
Results: We present a program, RNAcofold, that computes the hybridization energy and base pairing pattern of a pair of interacting RNA molecules. In contrast to earlier approaches, complex internal structures in both RNAs are fully taken into account. RNAcofold supports the calculation of the minimum energy structure and of a complete set of suboptimal structures in an energy band above the ground state. Furthermore, it provides an extension of McCaskill's partition function algorithm to compute base pairing probabilities, realistic interaction energies, and equilibrium concentrations of duplex structures
Target prediction and a statistical sampling algorithm for RNA-RNA interaction
It has been proven that the accessibility of the target sites has a critical
influence for miRNA and siRNA. In this paper, we present a program, rip2.0, not
only the energetically most favorable targets site based on the
hybrid-probability, but also a statistical sampling structure to illustrate the
statistical characterization and representation of the Boltzmann ensemble of
RNA-RNA interaction structures. The outputs are retrieved via backtracing an
improved dynamic programming solution for the partition function based on the
approach of Huang et al. (Bioinformatics). The time and space
algorithm is implemented in C (available from
\url{http://www.combinatorics.cn/cbpc/rip2.html})Comment: 7 pages, 10 figure
LinearCoFold and LinearCoPartition: Linear-Time Algorithms for Secondary Structure Prediction of Interacting RNA molecules
Many ncRNAs function through RNA-RNA interactions. Fast and reliable RNA
structure prediction with consideration of RNA-RNA interaction is useful. Some
existing tools are less accurate due to omitting the competing of
intermolecular and intramolecular base pairs, or focus more on predicting the
binding region rather than predicting the complete secondary structure of two
interacting strands. Vienna RNAcofold, which reduces the problem into the
classical single sequence folding by concatenating two strands, scales in cubic
time against the combined sequence length, and is slow for long sequences. To
address these issues, we present LinearCoFold, which predicts the complete
minimum free energy structure of two strands in linear runtime, and
LinearCoPartition, which calculates the cofolding partition function and base
pairing probabilities in linear runtime. LinearCoFold and LinearCoPartition
follows the concatenation strategy of RNAcofold, but are orders of magnitude
faster than RNAcofold. For example, on a sequence pair with combined length of
26,190 nt, LinearCoFold is 86.8x faster than RNAcofold MFE mode (0.6 minutes
vs. 52.1 minutes), and LinearCoPartition is 642.3x faster than RNAcofold
partition function mode (1.8 minutes vs. 1156.2 minutes). Different from the
local algorithms, LinearCoFold and LinearCoPartition are global cofolding
algorithms without restriction on base pair length. Surprisingly, LinearCoFold
and LinearCoPartition's predictions have higher PPV and sensitivity of
intermolecular base pairs. Furthermore, we apply LinearCoFold to predict the
RNA-RNA interaction between SARS-CoV-2 gRNA and human U4 snRNA, which has been
experimentally studied, and observe that LinearCoFold's prediction correlates
better to the wet lab results
Topology of RNA-RNA interaction structures
The topological filtration of interacting RNA complexes is studied and the
role is analyzed of certain diagrams called irreducible shadows, which form
suitable building blocks for more general structures. We prove that for two
interacting RNAs, called interaction structures, there exist for fixed genus
only finitely many irreducible shadows. This implies that for fixed genus there
are only finitely many classes of interaction structures. In particular the
simplest case of genus zero already provides the formalism for certain types of
structures that occur in nature and are not covered by other filtrations. This
case of genus zero interaction structures is already of practical interest, is
studied here in detail and found to be expressed by a multiple context-free
grammar extending the usual one for RNA secondary structures. We show that in
time and space complexity, this grammar for genus zero
interaction structures provides not only minimum free energy solutions but also
the complete partition function and base pairing probabilities.Comment: 40 pages 15 figure
Exact Learning of RNA Energy Parameters From Structure
We consider the problem of exact learning of parameters of a linear RNA
energy model from secondary structure data. A necessary and sufficient
condition for learnability of parameters is derived, which is based on
computing the convex hull of union of translated Newton polytopes of input
sequences. The set of learned energy parameters is characterized as the convex
cone generated by the normal vectors to those facets of the resulting polytope
that are incident to the origin. In practice, the sufficient condition may not
be satisfied by the entire training data set; hence, computing a maximal subset
of training data for which the sufficient condition is satisfied is often
desired. We show that problem is NP-hard in general for an arbitrary
dimensional feature space. Using a randomized greedy algorithm, we select a
subset of RNA STRAND v2.0 database that satisfies the sufficient condition for
separate A-U, C-G, G-U base pair counting model. The set of learned energy
parameters includes experimentally measured energies of A-U, C-G, and G-U
pairs; hence, our parameter set is in agreement with the Turner parameters
RNA-RNA interaction prediction based on multiple sequence alignments
Many computerized methods for RNA-RNA interaction structure prediction have
been developed. Recently, time and space dynamic programming
algorithms have become available that compute the partition function of RNA-RNA
interaction complexes. However, few of these methods incorporate the knowledge
concerning related sequences, thus relevant evolutionary information is often
neglected from the structure determination. Therefore, it is of considerable
practical interest to introduce a method taking into consideration both
thermodynamic stability and sequence covariation. We present the \emph{a
priori} folding algorithm \texttt{ripalign}, whose input consists of two
(given) multiple sequence alignments (MSA). \texttt{ripalign} outputs (1) the
partition function, (2) base-pairing probabilities, (3) hybrid probabilities
and (4) a set of Boltzmann-sampled suboptimal structures consisting of
canonical joint structures that are compatible to the alignments. Compared to
the single sequence-pair folding algorithm \texttt{rip}, \texttt{ripalign}
requires negligible additional memory resource. Furthermore, we incorporate
possible structure constraints as input parameters into our algorithm. The
algorithm described here is implemented in C as part of the \texttt{rip}
package. The supplemental material, source code and input/output files can
freely be downloaded from \url{http://www.combinatorics.cn/cbpc/ripalign.html}.
\section{Contact} Christian Reidys \texttt{[email protected]}Comment: 8 pages, 9 figure
- âŚ