Search CORE

3 research outputs found

Streaming algorithms for recognizing nearly well-parenthesized expressions

Author: Krebs Andreas
Limaye Nutan
Srinivasan Srikanth
Publication venue
Publication date: 01/06/2012
Field of study

We study the streaming complexity of the membership problem of 1-turn-Dyck2 and Dyck2 when there are a few errors in the input string. 1-turn-Dyck2 with errors: We prove that there exists a randomized one-pass algorithm that given x checks whether there exists a string x' in 1-turn-Dyck2 such that x is obtained by flipping at most

k

locations of x' using: - O(k log n) space, O(k log n) randomness, and poly(k log n) time per item and with error at most 1/poly(n). - O(k^{1+epsilon} + log n) space for every 0 <= epsilon <= 1, O(log n) randomness, O(polylog(n) + poly(k)) time per item, with error at most 1/8. Here, we also prove that any randomized one-pass algorithm that makes error at most k/n requires at least Omega(k log(n/k)) space to accept strings which are exactly k-away from strings in 1-turn-Dyck2 and to reject strings which are exactly (k+2)-away from strings in 1-turn-Dyck2. Since 1-turn-Dyck2 and the Hamming Distance problem are closely related we also obtain new upper and lower bounds for this problem. Dyck2 with errors: We prove that there exists a randomized one-pass algorithm that given x checks whether there exists a string x' in Dyck2 such that x is obtained from x' by changing (in some restricted manner) at most k positions using: - O(k log n + sqrt(n log n)) space, O(k log n) randomness, poly(k log n) time per element and with error at most 1/poly(n). - O(k^(1+epsilon)+ sqrt(n log n)) space for every 0 <= epsilon <= 1, O(log n) randomness, O(polylog(n) + poly(k)) time per element, with error at most 1/8

arXiv.org e-Print Archive

Efficiently Computing Edit Distance to Dyck Language

Author: Saha Barna
Publication venue
Publication date: 12/11/2013
Field of study

Given a string

\sigma

over alphabet

\Sigma

and a grammar

G

defined over the same alphabet, how many minimum number of repairs: insertions, deletions and substitutions are required to map

\sigma

into a valid member of

G

? We investigate this basic question in this paper for

Dyck(s)

Dyck(s)

is a fundamental context free grammar representing the language of well-balanced parentheses with s different types of parentheses and has played a pivotal role in the development of theory of context free languages. Computing edit distance to

Dyck(s)

significantly generalizes string edit distance problem and has numerous applications ranging from repairing semi-structured documents such as XML to memory checking, automated compiler optimization, natural language processing etc. In this paper we give the first near-linear time algorithm for edit distance computation to

Dyck(s)

that achieves a nontrivial approximation factor of

O(\frac{1}{\epsilon}\log{OPT}(\log{n})^{\frac{1}{\epsilon}})

O(n^{1+\epsilon}\log{n})

time. In fact, given there exists an algorithm for computing string edit distance on input of size

n

\alpha(n)

time with

\beta(n)

-approximation factor, we can devise an algorithm for edit distance problem to

Dyck(s)

running in

\tilde{O}(n^{1+\epsilon}+\alpha(n))

and achieving an approximation factor of

O(\frac{1}{\epsilon}\beta(n)\log{OPT})

. We show that the framework for efficiently approximating edit distance to

Dyck(s)

can be applied to many other languages. We illustrate this by considering various memory checking languages which comprise of valid transcripts of stacks, queues, priority queues, double-ended queues etc. Therefore, any language that can be recognized by these data structures, can also be repaired efficiently by our algorithm.Comment: 29 page

arXiv.org e-Print Archive

Language Edit Distance & Maximum Likelihood Parsing of Stochastic Grammars: Faster Algorithms & Connection to Fundamental Graph Problems

Author: Saha Barna
Publication venue
Publication date: 19/10/2015
Field of study

Given a context free language

L(G)

over alphabet

\Sigma

and a string

s \in \Sigma^*

, the language edit distance (Lan-ED) problem seeks the minimum number of edits (insertions, deletions and substitutions) required to convert

s

into a valid member of

L(G)

. The well-known dynamic programming algorithm solves this problem in

O(n^3)

time (ignoring grammar size) where

n

is the string length [Aho, Peterson 1972, Myers 1985]. Despite its vast number of applications, there is no algorithm known till date that computes or approximates Lan-ED in true sub-cubic time. In this paper we give the first such algorithm that computes Lan-ED almost optimally. For any arbitrary

\epsilon > 0

, our algorithm runs in

\tilde{O}(\frac{n^{\omega}}{poly(\epsilon)})

time and returns an estimate within a multiplicative approximation factor of

(1+\epsilon)

, where

\omega

is the exponent of ordinary matrix multiplication of

n

dimensional square matrices. It also computes the edit script. Further, for all substrings of

s

, we can estimate their Lan-ED within

(1\pm \epsilon)

factor in

\tilde{O}(\frac{n^{\omega}}{poly(\epsilon)})

time with high probability. We also design the very first sub-cubic (

\tilde{O}(n^\omega)

) algorithm to handle arbitrary stochastic context free grammar (SCFG) parsing. SCFGs lie at the foundation of statistical natural language processing, they generalize hidden Markov models, and have found widespread applications. To complement our upper bound result, we show that exact computation of SCFG parsing, or Lan-ED with insertion as only edit operation in true sub-cubic time will imply a truly sub-cubic algorithm for all-pairs shortest paths, and hence to a large range of problems in graphs and matrices. Known lower bound results on parsing implies no improvement over our time bound of

O(n^\omega)

is possible for any nontrivial multiplicative approximation.Comment: 36 pages: This is an updated version of the previous submission "Faster Language Edit Distance, Connection to All-pairs Shortest Paths and Related Problems". Introduction is rewritten, an error in a previous lower bound proof corrected, and the Sidon sequence construction is elaborate

arXiv.org e-Print Archive