Search CORE

981 research outputs found

If the Current Clique Algorithms are Optimal, so is Valiant's Parser

Author: Abboud Amir
Backurs Arturs
Williams Virginia Vassilevska
Publication venue
Publication date: 05/11/2015
Field of study

The CFG recognition problem is: given a context-free grammar

\mathcal{G}

and a string

w

of length

n

, decide if

w

can be obtained from

\mathcal{G}

. This is the most basic parsing question and is a core computer science problem. Valiant's parser from 1975 solves the problem in

O(n^{\omega})

time, where

\omega<2.373

is the matrix multiplication exponent. Dozens of parsing algorithms have been proposed over the years, yet Valiant's upper bound remains unbeaten. The best combinatorial algorithms have mildly subcubic

O(n^3/\log^3{n})

complexity. Lee (JACM'01) provided evidence that fast matrix multiplication is needed for CFG parsing, and that very efficient and practical algorithms might be hard or even impossible to obtain. Lee showed that any algorithm for a more general parsing problem with running time

O(|\mathcal{G}|\cdot n^{3-\varepsilon})

can be converted into a surprising subcubic algorithm for Boolean Matrix Multiplication. Unfortunately, Lee's hardness result required that the grammar size be

|\mathcal{G}|=\Omega(n^6)

. Nothing was known for the more relevant case of constant size grammars. In this work, we prove that any improvement on Valiant's algorithm, even for constant size grammars, either in terms of runtime or by avoiding the inefficiencies of fast matrix multiplication, would imply a breakthrough algorithm for the

k

-Clique problem: given a graph on

n

nodes, decide if there are

k

that form a clique. Besides classifying the complexity of a fundamental problem, our reduction has led us to similar lower bounds for more modern and well-studied cubic time problems for which faster algorithms are highly desirable in practice: RNA Folding, a central problem in computational biology, and Dyck Language Edit Distance, answering an open question of Saha (FOCS'14)

arXiv.org e-Print Archive

Crossref

An Efficient Algorithm for Upper Bound on the Partition Function of Nucleic Acids

Author: Chitsaz Hamidreza
Forouzmand Elmirasadat
Haffari Gholamreza
Publication venue
Publication date: 01/01/2013
Field of study

It has been shown that minimum free energy structure for RNAs and RNA-RNA interaction is often incorrect due to inaccuracies in the energy parameters and inherent limitations of the energy model. In contrast, ensemble based quantities such as melting temperature and equilibrium concentrations can be more reliably predicted. Even structure prediction by sampling from the ensemble and clustering those structures by Sfold [7] has proven to be more reliable than minimum free energy structure prediction. The main obstacle for ensemble based approaches is the computational complexity of the partition function and base pairing probabilities. For instance, the space complexity of the partition function for RNA-RNA interaction is

O(n^4)

and the time complexity is

O(n^6)

which are prohibitively large [4,12]. Our goal in this paper is to give a fast algorithm, based on sparse folding, to calculate an upper bound on the partition function. Our work is based on the recent algorithm of Hazan and Jaakkola [10]. The space complexity of our algorithm is the same as that of sparse folding algorithms, and the time complexity of our algorithm is

O(MFE(n)\ell)

for single RNA and

O(MFE(m, n)\ell)

for RNA-RNA interaction in practice, in which

MFE

is the running time of sparse folding and

\ell \leq n

(

\ell \leq n + m

) is a sequence dependent parameter

arXiv.org e-Print Archive

CiteSeerX

Landscape statistics of the low autocorrelated binary string problem

Author: Azencott R
Bernasconi J
Bouchaud J P
Bray A J
Catoni O
Catoni O
Cieplak M
de Oliveira V M
de Oliveira V M
Ewens W J
Fernando F Ferreira
Golay M J E
Hajek B
José F Fontanari
Klotz T
Klotz T
Marinari E
Marinari E
Mertens S
Migliorini G
Mézard M
Peter F Stadler
Reidys C M
Reidys C M
Stadler P F
Tanaka F
Vertechi A M
Publication venue: 'IOP Publishing'
Publication date: 01/01/2000
Field of study

The statistical properties of the energy landscape of the low autocorrelated binary string problem (LABSP) are studied numerically and compared with those of several classic disordered models. Using two global measures of landscape structure which have been introduced in the Simulated Annealing literature, namely, depth and difficulty, we find that the landscape of LABSP, except perhaps for a very large degeneracy of the local minima energies, is qualitatively similar to some well-known landscapes such as that of the mean-field 2-spin glass model. Furthermore, we consider a mean-field approximation to the pure model proposed by Bouchaud and Mezard (1994, J. Physique I France 4 1109) and show both analytically and numerically that it describes extremely well the statistical properties of LABSP

arXiv.org e-Print Archive

CiteSeerX

Crossref

Impact Of The Energy Model On The Complexity Of RNA Folding With Pseudoknots

Author: C. Alkan
C. Liu
C. Theis
C.M. Reidys
E. Bindewald
E. Rivas
H. Yang
J. Zhao
J.E. Tabaska
M. Jiang
M.R. Garey
M.V. Ashley
R. Nussinov
R.B. Lyngsø
R.B. Lyngsø
S. Griffiths-Jones
S. Ieong
T. Akutsu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

International audiencePredicting the folding of an RNA sequence, while allowing general pseudoknots (PK), consists in finding a minimal free-energy matching of its

n

positions. Assuming independently contributing base-pairs, the problem can be solved in

\Theta(n^3)

-time using a variant of the maximal weighted matching. By contrast, the problem was previously proven NP-Hard in the more realistic nearest-neighbor energy model. In this work, we consider an intermediate model, called the stacking-pairs energy model. We extend a result by Lyngs\o, showing that RNA folding with PK is NP-Hard within a large class of parametrization for the model. We also show the approximability of the problem, by giving a practical

\Theta(n^3)

algorithm that achieves at least a

5

-approximation for any parametrization of the stacking model. This contrasts nicely with the nearest-neighbor version of the problem, which we prove cannot be approximated within any positive ratio, unless

P=NP

.La prédiction du repliement, avec pseudonoeuds généraux, d'une séquence d'ARN de taille

n

est équivalent à la recherche d'un couplage d'énergie libre minimale. Dans un modèle d'énergie simple, où chaque paire de base contribue indépendamment à l'énergie, ce problème peut être résolu en temps

\Theta(n^3)

grâce à une variante d'un algorithme de couplage pondéré maximal. Cependant, le même problème a été démontré NP-difficile dans le modèle d'énergie dit des plus proches voisins. Dans ce travail, nous étudions les propriétés du problème sous un modèle d'empilements, constituant un modèle intermédiaire entre ceux d'appariement et des plus proches voisins. Nous démontrons tout d'abord que le repliement avec pseudo-noeuds de l'ARN reste NP-difficile dans de nombreuses valuations du modèle d'énergie. . Par ailleurs, nous montrons que ce problème est approximable, en proposant un algorithme polynomial garantissant une

1/5

-approximation. Ce résultat illustre une différence essentielle entre ce modèle et celui des plus proches voisins, pour lequel nous montrons qu'il ne peut être approché à aucun ratio positif par un algorithme en temps polynomial sauf si

N=NP

HAL-CentraleSupelec

Crossref

INRIA a CCSD electronic archive server

Hal-Diderot

HAL-Polytechnique

Combinatorial RNA Design: Designability and Structure-Approximating Algorithm

Author: A Avihoo
A Busch
A Esmaili-Taheri
A Levin
A Taneda
C Höner Zu Siederdissen
DH Mathews
DH Turner
G Rodrigo
IL Hofacker
JA Garcia-Martin
JN Zadeh
M Zuker
MK Takahashi
R Aguirre-Hernández
R Nussinov
RB Lyngsø
S Griffiths-Jones
SY Wu
V Reinharz
Y Frid
Publication venue
Publication date: 19/06/2015
Field of study

In this work, we consider the Combinatorial RNA Design problem, a minimal instance of the RNA design problem which aims at finding a sequence that admits a given target as its unique base pair maximizing structure. We provide complete characterizations for the structures that can be designed using restricted alphabets. Under a classic four-letter alphabet, we provide a complete characterization of designable structures without unpaired bases. When unpaired bases are allowed, we provide partial characterizations for classes of designable/undesignable structures, and show that the class of designable structures is closed under the stutter operation. Membership of a given structure to any of the classes can be tested in linear time and, for positive instances, a solution can be found in linear time. Finally, we consider a structure-approximating version of the problem that allows to extend bands (helices) and, assuming that the input structure avoids two motifs, we provide a linear-time algorithm that produces a designable structure with at most twice more base pairs than the input structure.Comment: CPM - 26th Annual Symposium on Combinatorial Pattern Matching, Jun 2015, Ischia Island, Italy. LNCS, 201

arXiv.org e-Print Archive

Crossref

INRIA a CCSD electronic archive server

HAL-Polytechnique

Exact Learning of RNA Energy Parameters From Structure

Author: Aminisharifabad Mohammad
Chitsaz Hamidreza
Publication venue
Publication date: 15/10/2013
Field of study

We consider the problem of exact learning of parameters of a linear RNA energy model from secondary structure data. A necessary and sufficient condition for learnability of parameters is derived, which is based on computing the convex hull of union of translated Newton polytopes of input sequences. The set of learned energy parameters is characterized as the convex cone generated by the normal vectors to those facets of the resulting polytope that are incident to the origin. In practice, the sufficient condition may not be satisfied by the entire training data set; hence, computing a maximal subset of training data for which the sufficient condition is satisfied is often desired. We show that problem is NP-hard in general for an arbitrary dimensional feature space. Using a randomized greedy algorithm, we select a subset of RNA STRAND v2.0 database that satisfies the sufficient condition for separate A-U, C-G, G-U base pair counting model. The set of learned energy parameters includes experimentally measured energies of A-U, C-G, and G-U pairs; hence, our parameter set is in agreement with the Turner parameters

arXiv.org e-Print Archive

CiteSeerX