981 research outputs found
If the Current Clique Algorithms are Optimal, so is Valiant's Parser
The CFG recognition problem is: given a context-free grammar
and a string of length , decide if can be obtained from
. This is the most basic parsing question and is a core computer
science problem. Valiant's parser from 1975 solves the problem in
time, where is the matrix multiplication
exponent. Dozens of parsing algorithms have been proposed over the years, yet
Valiant's upper bound remains unbeaten. The best combinatorial algorithms have
mildly subcubic complexity.
Lee (JACM'01) provided evidence that fast matrix multiplication is needed for
CFG parsing, and that very efficient and practical algorithms might be hard or
even impossible to obtain. Lee showed that any algorithm for a more general
parsing problem with running time can
be converted into a surprising subcubic algorithm for Boolean Matrix
Multiplication. Unfortunately, Lee's hardness result required that the grammar
size be . Nothing was known for the more relevant
case of constant size grammars.
In this work, we prove that any improvement on Valiant's algorithm, even for
constant size grammars, either in terms of runtime or by avoiding the
inefficiencies of fast matrix multiplication, would imply a breakthrough
algorithm for the -Clique problem: given a graph on nodes, decide if
there are that form a clique.
Besides classifying the complexity of a fundamental problem, our reduction
has led us to similar lower bounds for more modern and well-studied cubic time
problems for which faster algorithms are highly desirable in practice: RNA
Folding, a central problem in computational biology, and Dyck Language Edit
Distance, answering an open question of Saha (FOCS'14)
An Efficient Algorithm for Upper Bound on the Partition Function of Nucleic Acids
It has been shown that minimum free energy structure for RNAs and RNA-RNA
interaction is often incorrect due to inaccuracies in the energy parameters and
inherent limitations of the energy model. In contrast, ensemble based
quantities such as melting temperature and equilibrium concentrations can be
more reliably predicted. Even structure prediction by sampling from the
ensemble and clustering those structures by Sfold [7] has proven to be more
reliable than minimum free energy structure prediction. The main obstacle for
ensemble based approaches is the computational complexity of the partition
function and base pairing probabilities. For instance, the space complexity of
the partition function for RNA-RNA interaction is and the time
complexity is which are prohibitively large [4,12]. Our goal in this
paper is to give a fast algorithm, based on sparse folding, to calculate an
upper bound on the partition function. Our work is based on the recent
algorithm of Hazan and Jaakkola [10]. The space complexity of our algorithm is
the same as that of sparse folding algorithms, and the time complexity of our
algorithm is for single RNA and for RNA-RNA
interaction in practice, in which is the running time of sparse folding
and () is a sequence dependent parameter
Landscape statistics of the low autocorrelated binary string problem
The statistical properties of the energy landscape of the low autocorrelated
binary string problem (LABSP) are studied numerically and compared with those
of several classic disordered models. Using two global measures of landscape
structure which have been introduced in the Simulated Annealing literature,
namely, depth and difficulty, we find that the landscape of LABSP, except
perhaps for a very large degeneracy of the local minima energies, is
qualitatively similar to some well-known landscapes such as that of the
mean-field 2-spin glass model. Furthermore, we consider a mean-field
approximation to the pure model proposed by Bouchaud and Mezard (1994, J.
Physique I France 4 1109) and show both analytically and numerically that it
describes extremely well the statistical properties of LABSP
Impact Of The Energy Model On The Complexity Of RNA Folding With Pseudoknots
International audiencePredicting the folding of an RNA sequence, while allowing general pseudoknots (PK), consists in finding a minimal free-energy matching of its positions. Assuming independently contributing base-pairs, the problem can be solved in -time using a variant of the maximal weighted matching. By contrast, the problem was previously proven NP-Hard in the more realistic nearest-neighbor energy model. In this work, we consider an intermediate model, called the stacking-pairs energy model. We extend a result by Lyngs\o, showing that RNA folding with PK is NP-Hard within a large class of parametrization for the model. We also show the approximability of the problem, by giving a practical algorithm that achieves at least a -approximation for any parametrization of the stacking model. This contrasts nicely with the nearest-neighbor version of the problem, which we prove cannot be approximated within any positive ratio, unless .La prédiction du repliement, avec pseudonoeuds généraux, d'une séquence d'ARN de taille est équivalent à la recherche d'un couplage d'énergie libre minimale. Dans un modèle d'énergie simple, où chaque paire de base contribue indépendamment à l'énergie, ce problème peut être résolu en temps grâce à une variante d'un algorithme de couplage pondéré maximal. Cependant, le même problème a été démontré NP-difficile dans le modèle d'énergie dit des plus proches voisins. Dans ce travail, nous étudions les propriétés du problème sous un modèle d'empilements, constituant un modèle intermédiaire entre ceux d'appariement et des plus proches voisins. Nous démontrons tout d'abord que le repliement avec pseudo-noeuds de l'ARN reste NP-difficile dans de nombreuses valuations du modèle d'énergie. . Par ailleurs, nous montrons que ce problème est approximable, en proposant un algorithme polynomial garantissant une -approximation. Ce résultat illustre une différence essentielle entre ce modèle et celui des plus proches voisins, pour lequel nous montrons qu'il ne peut être approché à aucun ratio positif par un algorithme en temps polynomial sauf si
Combinatorial RNA Design: Designability and Structure-Approximating Algorithm
In this work, we consider the Combinatorial RNA Design problem, a minimal
instance of the RNA design problem which aims at finding a sequence that admits
a given target as its unique base pair maximizing structure. We provide
complete characterizations for the structures that can be designed using
restricted alphabets. Under a classic four-letter alphabet, we provide a
complete characterization of designable structures without unpaired bases. When
unpaired bases are allowed, we provide partial characterizations for classes of
designable/undesignable structures, and show that the class of designable
structures is closed under the stutter operation. Membership of a given
structure to any of the classes can be tested in linear time and, for positive
instances, a solution can be found in linear time. Finally, we consider a
structure-approximating version of the problem that allows to extend bands
(helices) and, assuming that the input structure avoids two motifs, we provide
a linear-time algorithm that produces a designable structure with at most twice
more base pairs than the input structure.Comment: CPM - 26th Annual Symposium on Combinatorial Pattern Matching, Jun
2015, Ischia Island, Italy. LNCS, 201
Exact Learning of RNA Energy Parameters From Structure
We consider the problem of exact learning of parameters of a linear RNA
energy model from secondary structure data. A necessary and sufficient
condition for learnability of parameters is derived, which is based on
computing the convex hull of union of translated Newton polytopes of input
sequences. The set of learned energy parameters is characterized as the convex
cone generated by the normal vectors to those facets of the resulting polytope
that are incident to the origin. In practice, the sufficient condition may not
be satisfied by the entire training data set; hence, computing a maximal subset
of training data for which the sufficient condition is satisfied is often
desired. We show that problem is NP-hard in general for an arbitrary
dimensional feature space. Using a randomized greedy algorithm, we select a
subset of RNA STRAND v2.0 database that satisfies the sufficient condition for
separate A-U, C-G, G-U base pair counting model. The set of learned energy
parameters includes experimentally measured energies of A-U, C-G, and G-U
pairs; hence, our parameter set is in agreement with the Turner parameters
- …