Search CORE

1,311 research outputs found

If the Current Clique Algorithms are Optimal, so is Valiant's Parser

Author: Abboud Amir
Backurs Arturs
Williams Virginia Vassilevska
Publication venue
Publication date: 05/11/2015
Field of study

The CFG recognition problem is: given a context-free grammar

\mathcal{G}

and a string

w

of length

n

, decide if

w

can be obtained from

\mathcal{G}

. This is the most basic parsing question and is a core computer science problem. Valiant's parser from 1975 solves the problem in

O(n^{\omega})

time, where

\omega<2.373

is the matrix multiplication exponent. Dozens of parsing algorithms have been proposed over the years, yet Valiant's upper bound remains unbeaten. The best combinatorial algorithms have mildly subcubic

O(n^3/\log^3{n})

complexity. Lee (JACM'01) provided evidence that fast matrix multiplication is needed for CFG parsing, and that very efficient and practical algorithms might be hard or even impossible to obtain. Lee showed that any algorithm for a more general parsing problem with running time

O(|\mathcal{G}|\cdot n^{3-\varepsilon})

can be converted into a surprising subcubic algorithm for Boolean Matrix Multiplication. Unfortunately, Lee's hardness result required that the grammar size be

|\mathcal{G}|=\Omega(n^6)

. Nothing was known for the more relevant case of constant size grammars. In this work, we prove that any improvement on Valiant's algorithm, even for constant size grammars, either in terms of runtime or by avoiding the inefficiencies of fast matrix multiplication, would imply a breakthrough algorithm for the

k

-Clique problem: given a graph on

n

nodes, decide if there are

k

that form a clique. Besides classifying the complexity of a fundamental problem, our reduction has led us to similar lower bounds for more modern and well-studied cubic time problems for which faster algorithms are highly desirable in practice: RNA Folding, a central problem in computational biology, and Dyck Language Edit Distance, answering an open question of Saha (FOCS'14)

arXiv.org e-Print Archive

Crossref

Compiling a domain specific language for dynamic programming

Author: Steffen Peter
Publication venue: Bielefeld University
Publication date: 01/01/2006
Field of study

Steffen P. Compiling a domain specific language for dynamic programming. Bielefeld (Germany): Bielefeld University; 2006

Publications at Bielefeld University

Accelerated probabilistic inference of RNA structure evolution

Author: Holmes Ian
Publication venue: BioMed Central
Publication date: 01/01/2005
Field of study

BACKGROUND: Pairwise stochastic context-free grammars (Pair SCFGs) are powerful tools for evolutionary analysis of RNA, including simultaneous RNA sequence alignment and secondary structure prediction, but the associated algorithms are intensive in both CPU and memory usage. The same problem is faced by other RNA alignment-and-folding algorithms based on Sankoff's 1985 algorithm. It is therefore desirable to constrain such algorithms, by pre-processing the sequences and using this first pass to limit the range of structures and/or alignments that can be considered. RESULTS: We demonstrate how flexible classes of constraint can be imposed, greatly reducing the computational costs while maintaining a high quality of structural homology prediction. Any score-attributed context-free grammar (e.g. energy-based scoring schemes, or conditionally normalized Pair SCFGs) is amenable to this treatment. It is now possible to combine independent structural and alignment constraints of unprecedented general flexibility in Pair SCFG alignment algorithms. We outline several applications to the bioinformatics of RNA sequence and structure, including Waterman-Eggert N-best alignments and progressive multiple alignment. We evaluate the performance of the algorithm on test examples from the RFAM database. CONCLUSION: A program, Stemloc, that implements these algorithms for efficient RNA sequence alignment and structure prediction is available under the GNU General Public License

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

eScholarship - University of California

Evaluation of several lightweight stochastic context-free grammars for RNA secondary structure prediction

Author: Dowell Robin D
Eddy Sean R
Publication venue: BioMed Central
Publication date: 01/01/2004
Field of study

BACKGROUND: RNA secondary structure prediction methods based on probabilistic modeling can be developed using stochastic context-free grammars (SCFGs). Such methods can readily combine different sources of information that can be expressed probabilistically, such as an evolutionary model of comparative RNA sequence analysis and a biophysical model of structure plausibility. However, the number of free parameters in an integrated model for consensus RNA structure prediction can become untenable if the underlying SCFG design is too complex. Thus a key question is, what small, simple SCFG designs perform best for RNA secondary structure prediction? RESULTS: Nine different small SCFGs were implemented to explore the tradeoffs between model complexity and prediction accuracy. Each model was tested for single sequence structure prediction accuracy on a benchmark set of RNA secondary structures. CONCLUSIONS: Four SCFG designs had prediction accuracies near the performance of current energy minimization programs. One of these designs, introduced by Knudsen and Hein in their PFOLD algorithm, has only 21 free parameters and is significantly simpler than the others

Directory of Open Access Journals

PubMed Central

Digital Commons@Becker

Characterising RNA secondary structure space using information entropy

Author
Publication venue: BioMed Central
Publication date
Field of study

Springer - Publisher Connector

Modeling and predicting all-α transmembrane proteins including helix–helix pairing

Author: Steyaert Jean-Marc
Waldispühl Jérôme
Publication venue: Elsevier B.V.
Publication date
Field of study

AbstractModeling and predicting the structure of proteins is one of the most important challenges of computational biology. Exact physical models are too complex to provide feasible prediction tools and other ab initio methods only use local and probabilistic information to fold a given sequence. We show in this paper that all-α transmembrane protein secondary and super-secondary structures can be modeled with a multi-tape S-attributed grammar. An efficient structure prediction algorithm using both local and global constraints is designed and evaluated. Comparison with existing methods shows that the prediction rates as well as the definition level are sensibly increased. Furthermore this approach can be generalized to more complex proteins

Elsevier - Publisher Connector

Algebraic Dynamic Programming over general data structures

Author: B Voß
C Höner zu Siederdissen
C Höner zu Siederdissen
C Höner zu Siederdissen
C Höner zu Siederdissen
C Höner zu Siederdissen
C Höner zu Siederdissen
C Höner zu Siederdissen
C McBride
Christian Höner zu Siederdissen
CM Reidys
FWD Huang
FWD Huang
G Sauthoff
J Garcia-Fernàndez
JK Baker
JS McCaskill
LR Rabiner
M Held
M Riechert
M Riechert
O Elemento
O Gotoh
P Billie
Peter F Stadler
R Bellman
R Durbin
R Giegerich
R Giegerich
R Lorenz
RA Cameron
RD Dowell
S Janssen
S Wuchty
SJ Prohaska
Sonja J Prohaska
WS Robinson
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

CONTRAfold: RNA secondary structure prediction without physics-based models

Author: Chuong B. Do
Daniel A. Woods
Serafim Batzoglou
Publication venue
Publication date: 01/01/2006
Field of study

doi:10.1093/bioinformatics/btl24

CiteSeerX

A Combinatorial Framework for Designing (Pseudoknotted) RNA Algorithms

We extend an hypergraph representation, introduced by Finkelstein and Roytberg, to unify dynamic programming algorithms in the context of RNA folding with pseudoknots. Classic applications of RNA dynamic programming energy minimization, partition function, base-pair probabilities...) are reformulated within this framework, giving rise to very simple algorithms. This reformulation allows one to conceptually detach the conformation space/energy model -- captured by the hypergraph model -- from the specific application, assuming unambiguity of the decomposition. To ensure the latter property, we propose a new combinatorial methodology based on generating functions. We extend the set of generic applications by proposing an exact algorithm for extracting generalized moments in weighted distribution, generalizing a prior contribution by Miklos and al. Finally, we illustrate our full-fledged programme on three exemplary conformation spaces (secondary structures, Akutsu's simple type pseudoknots and kissing hairpins). This readily gives sets of algorithms that are either novel or have complexity comparable to classic implementations for minimization and Boltzmann ensemble applications of dynamic programming

arXiv.org e-Print Archive

HAL-CentraleSupelec

CiteSeerX

Crossref

INRIA a CCSD electronic archive server

Hal-Diderot

HAL-Polytechnique

HAL-Rennes 1