981 research outputs found

    If the Current Clique Algorithms are Optimal, so is Valiant's Parser

    Full text link
    The CFG recognition problem is: given a context-free grammar G\mathcal{G} and a string ww of length nn, decide if ww can be obtained from G\mathcal{G}. This is the most basic parsing question and is a core computer science problem. Valiant's parser from 1975 solves the problem in O(nω)O(n^{\omega}) time, where ω<2.373\omega<2.373 is the matrix multiplication exponent. Dozens of parsing algorithms have been proposed over the years, yet Valiant's upper bound remains unbeaten. The best combinatorial algorithms have mildly subcubic O(n3/log3n)O(n^3/\log^3{n}) complexity. Lee (JACM'01) provided evidence that fast matrix multiplication is needed for CFG parsing, and that very efficient and practical algorithms might be hard or even impossible to obtain. Lee showed that any algorithm for a more general parsing problem with running time O(Gn3ε)O(|\mathcal{G}|\cdot n^{3-\varepsilon}) can be converted into a surprising subcubic algorithm for Boolean Matrix Multiplication. Unfortunately, Lee's hardness result required that the grammar size be G=Ω(n6)|\mathcal{G}|=\Omega(n^6). Nothing was known for the more relevant case of constant size grammars. In this work, we prove that any improvement on Valiant's algorithm, even for constant size grammars, either in terms of runtime or by avoiding the inefficiencies of fast matrix multiplication, would imply a breakthrough algorithm for the kk-Clique problem: given a graph on nn nodes, decide if there are kk that form a clique. Besides classifying the complexity of a fundamental problem, our reduction has led us to similar lower bounds for more modern and well-studied cubic time problems for which faster algorithms are highly desirable in practice: RNA Folding, a central problem in computational biology, and Dyck Language Edit Distance, answering an open question of Saha (FOCS'14)

    An Efficient Algorithm for Upper Bound on the Partition Function of Nucleic Acids

    Full text link
    It has been shown that minimum free energy structure for RNAs and RNA-RNA interaction is often incorrect due to inaccuracies in the energy parameters and inherent limitations of the energy model. In contrast, ensemble based quantities such as melting temperature and equilibrium concentrations can be more reliably predicted. Even structure prediction by sampling from the ensemble and clustering those structures by Sfold [7] has proven to be more reliable than minimum free energy structure prediction. The main obstacle for ensemble based approaches is the computational complexity of the partition function and base pairing probabilities. For instance, the space complexity of the partition function for RNA-RNA interaction is O(n4)O(n^4) and the time complexity is O(n6)O(n^6) which are prohibitively large [4,12]. Our goal in this paper is to give a fast algorithm, based on sparse folding, to calculate an upper bound on the partition function. Our work is based on the recent algorithm of Hazan and Jaakkola [10]. The space complexity of our algorithm is the same as that of sparse folding algorithms, and the time complexity of our algorithm is O(MFE(n))O(MFE(n)\ell) for single RNA and O(MFE(m,n))O(MFE(m, n)\ell) for RNA-RNA interaction in practice, in which MFEMFE is the running time of sparse folding and n\ell \leq n (n+m\ell \leq n + m) is a sequence dependent parameter

    Landscape statistics of the low autocorrelated binary string problem

    Full text link
    The statistical properties of the energy landscape of the low autocorrelated binary string problem (LABSP) are studied numerically and compared with those of several classic disordered models. Using two global measures of landscape structure which have been introduced in the Simulated Annealing literature, namely, depth and difficulty, we find that the landscape of LABSP, except perhaps for a very large degeneracy of the local minima energies, is qualitatively similar to some well-known landscapes such as that of the mean-field 2-spin glass model. Furthermore, we consider a mean-field approximation to the pure model proposed by Bouchaud and Mezard (1994, J. Physique I France 4 1109) and show both analytically and numerically that it describes extremely well the statistical properties of LABSP

    Impact Of The Energy Model On The Complexity Of RNA Folding With Pseudoknots

    Get PDF
    International audiencePredicting the folding of an RNA sequence, while allowing general pseudoknots (PK), consists in finding a minimal free-energy matching of its nn positions. Assuming independently contributing base-pairs, the problem can be solved in Θ(n3)\Theta(n^3)-time using a variant of the maximal weighted matching. By contrast, the problem was previously proven NP-Hard in the more realistic nearest-neighbor energy model. In this work, we consider an intermediate model, called the stacking-pairs energy model. We extend a result by Lyngs\o, showing that RNA folding with PK is NP-Hard within a large class of parametrization for the model. We also show the approximability of the problem, by giving a practical Θ(n3)\Theta(n^3) algorithm that achieves at least a 55-approximation for any parametrization of the stacking model. This contrasts nicely with the nearest-neighbor version of the problem, which we prove cannot be approximated within any positive ratio, unless P=NPP=NP.La prédiction du repliement, avec pseudonoeuds généraux, d'une séquence d'ARN de taille nn est équivalent à la recherche d'un couplage d'énergie libre minimale. Dans un modèle d'énergie simple, où chaque paire de base contribue indépendamment à l'énergie, ce problème peut être résolu en temps Θ(n3)\Theta(n^3) grâce à une variante d'un algorithme de couplage pondéré maximal. Cependant, le même problème a été démontré NP-difficile dans le modèle d'énergie dit des plus proches voisins. Dans ce travail, nous étudions les propriétés du problème sous un modèle d'empilements, constituant un modèle intermédiaire entre ceux d'appariement et des plus proches voisins. Nous démontrons tout d'abord que le repliement avec pseudo-noeuds de l'ARN reste NP-difficile dans de nombreuses valuations du modèle d'énergie. . Par ailleurs, nous montrons que ce problème est approximable, en proposant un algorithme polynomial garantissant une 1/51/5-approximation. Ce résultat illustre une différence essentielle entre ce modèle et celui des plus proches voisins, pour lequel nous montrons qu'il ne peut être approché à aucun ratio positif par un algorithme en temps polynomial sauf si N=NPN=NP

    Combinatorial RNA Design: Designability and Structure-Approximating Algorithm

    Get PDF
    In this work, we consider the Combinatorial RNA Design problem, a minimal instance of the RNA design problem which aims at finding a sequence that admits a given target as its unique base pair maximizing structure. We provide complete characterizations for the structures that can be designed using restricted alphabets. Under a classic four-letter alphabet, we provide a complete characterization of designable structures without unpaired bases. When unpaired bases are allowed, we provide partial characterizations for classes of designable/undesignable structures, and show that the class of designable structures is closed under the stutter operation. Membership of a given structure to any of the classes can be tested in linear time and, for positive instances, a solution can be found in linear time. Finally, we consider a structure-approximating version of the problem that allows to extend bands (helices) and, assuming that the input structure avoids two motifs, we provide a linear-time algorithm that produces a designable structure with at most twice more base pairs than the input structure.Comment: CPM - 26th Annual Symposium on Combinatorial Pattern Matching, Jun 2015, Ischia Island, Italy. LNCS, 201

    Exact Learning of RNA Energy Parameters From Structure

    Full text link
    We consider the problem of exact learning of parameters of a linear RNA energy model from secondary structure data. A necessary and sufficient condition for learnability of parameters is derived, which is based on computing the convex hull of union of translated Newton polytopes of input sequences. The set of learned energy parameters is characterized as the convex cone generated by the normal vectors to those facets of the resulting polytope that are incident to the origin. In practice, the sufficient condition may not be satisfied by the entire training data set; hence, computing a maximal subset of training data for which the sufficient condition is satisfied is often desired. We show that problem is NP-hard in general for an arbitrary dimensional feature space. Using a randomized greedy algorithm, we select a subset of RNA STRAND v2.0 database that satisfies the sufficient condition for separate A-U, C-G, G-U base pair counting model. The set of learned energy parameters includes experimentally measured energies of A-U, C-G, and G-U pairs; hence, our parameter set is in agreement with the Turner parameters
    corecore