Search CORE

1,117 research outputs found

If the Current Clique Algorithms are Optimal, so is Valiant's Parser

Author: Abboud Amir
Backurs Arturs
Williams Virginia Vassilevska
Publication venue
Publication date: 05/11/2015
Field of study

The CFG recognition problem is: given a context-free grammar

\mathcal{G}

and a string

w

of length

n

, decide if

w

can be obtained from

\mathcal{G}

. This is the most basic parsing question and is a core computer science problem. Valiant's parser from 1975 solves the problem in

O(n^{\omega})

time, where

\omega<2.373

is the matrix multiplication exponent. Dozens of parsing algorithms have been proposed over the years, yet Valiant's upper bound remains unbeaten. The best combinatorial algorithms have mildly subcubic

O(n^3/\log^3{n})

complexity. Lee (JACM'01) provided evidence that fast matrix multiplication is needed for CFG parsing, and that very efficient and practical algorithms might be hard or even impossible to obtain. Lee showed that any algorithm for a more general parsing problem with running time

O(|\mathcal{G}|\cdot n^{3-\varepsilon})

can be converted into a surprising subcubic algorithm for Boolean Matrix Multiplication. Unfortunately, Lee's hardness result required that the grammar size be

|\mathcal{G}|=\Omega(n^6)

. Nothing was known for the more relevant case of constant size grammars. In this work, we prove that any improvement on Valiant's algorithm, even for constant size grammars, either in terms of runtime or by avoiding the inefficiencies of fast matrix multiplication, would imply a breakthrough algorithm for the

k

-Clique problem: given a graph on

n

nodes, decide if there are

k

that form a clique. Besides classifying the complexity of a fundamental problem, our reduction has led us to similar lower bounds for more modern and well-studied cubic time problems for which faster algorithms are highly desirable in practice: RNA Folding, a central problem in computational biology, and Dyck Language Edit Distance, answering an open question of Saha (FOCS'14)

arXiv.org e-Print Archive

Crossref

Certified Context-Free Parsing: A formalisation of Valiant's Algorithm in Agda

Author: Bernardy Jean-Philippe
Jansson Patrik
Publication venue: 'Logical Methods in Computer Science e.V.'
Publication date: 01/01/2016
Field of study

Valiant (1975) has developed an algorithm for recognition of context free languages. As of today, it remains the algorithm with the best asymptotic complexity for this purpose. In this paper, we present an algebraic specification, implementation, and proof of correctness of a generalisation of Valiant's algorithm. The generalisation can be used for recognition, parsing or generic calculation of the transitive closure of upper triangular matrices. The proof is certified by the Agda proof assistant. The certification is representative of state-of-the-art methods for specification and proofs in proof assistants based on type-theory. As such, this paper can be read as a tutorial for the Agda system

arXiv.org e-Print Archive

Crossref

Episciences.org

Directory of Open Access Journals

Chalmers Research

Chalmers Publication Library

Clique-Based Lower Bounds for Parsing Tree-Adjoining Grammars

Author: Bringmann Karl
Wellnitz Philip
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017)
Publication date: 01/01/2017
Field of study

up to lower order factors

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

MPG.PuRe

Tabular Parsing

Author: Nederhof Mark-Jan
Satta Giorgio
Publication venue
Publication date: 01/01/2004
Field of study

This is a tutorial on tabular parsing, on the basis of tabulation of nondeterministic push-down automata. Discussed are Earley's algorithm, the Cocke-Kasami-Younger algorithm, tabular LR parsing, the construction of parse trees, and further issues.Comment: 21 pages, 14 figure

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Università di Padova

Parallel Enumeration of Parse Trees

Author: Mikhelson Margarita
Okhotin Alexander
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 48th International Symposium on Mathematical Foundations of Computer Science (MFCS 2023)
Publication date: 01/01/2023
Field of study

Dagstuhl Research Online Publication Server

Approximating Language Edit Distance Beyond Fast Matrix Multiplication: Ultralinear Grammars Are Where Parsing Becomes Hard!

Author: Jayaram Rajesh
Saha Barna
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 44th International Colloquium on Automata, Languages, and Programming (ICALP 2017)
Publication date: 01/01/2017
Field of study

In 1975, a breakthrough result of L. Valiant showed that parsing context free grammars can be reduced to Boolean matrix multiplication, resulting in a running time of O(n^omega) for parsing where omega <= 2.373 is the exponent of fast matrix multiplication, and n is the string length. Recently, Abboud, Backurs and V. Williams (FOCS 2015) demonstrated that this is likely optimal; moreover, a combinatorial o(n^3) algorithm is unlikely to exist for the general parsing problem. The language edit distance problem is a significant generalization of the parsing problem, which computes the minimum edit distance of a given string (using insertions, deletions, and substitutions) to any valid string in the language, and has received significant attention both in theory and practice since the seminal work of Aho and Peterson in 1972. Clearly, the lower bound for parsing rules out any algorithm running in o(n^omega) time that can return a nontrivial multiplicative approximation of the language edit distance problem. Furthermore, combinatorial algorithms with cubic running time or algorithms that use fast matrix multiplication are often not desirable in practice. To break this n^omega hardness barrier, in this paper we study additive approximation algorithms for language edit distance. We provide two explicit combinatorial algorithms to obtain a string with minimum edit distance with performance dependencies on either the number of non-linear productions, k^*, or the number of nested non-linear production, k, used in the optimal derivation. Explicitly, we give an additive O(k^*gamma) approximation in time O(|G|(n^2 + (n/gamma)^3)) and an additive O(k gamma) approximation in time O(|G|(n^2 + (n^3/gamma^2))), where |G| is the grammar size and n is the string length. In particular, we obtain tight approximations for an important subclass of context free grammars known as ultralinear grammars, for which k and k^* are naturally bounded. Interestingly, we show that the same conditional lower bound for parsing context free grammars holds for the class of ultralinear grammars as well, clearly marking the boundary where parsing becomes hard

Dagstuhl Research Online Publication Server