3 research outputs found
Streaming algorithms for recognizing nearly well-parenthesized expressions
We study the streaming complexity of the membership problem of 1-turn-Dyck2
and Dyck2 when there are a few errors in the input string.
1-turn-Dyck2 with errors: We prove that there exists a randomized one-pass
algorithm that given x checks whether there exists a string x' in 1-turn-Dyck2
such that x is obtained by flipping at most locations of x' using:
- O(k log n) space, O(k log n) randomness, and poly(k log n) time per item
and with error at most 1/poly(n). - O(k^{1+epsilon} + log n) space for every 0
<= epsilon <= 1, O(log n) randomness, O(polylog(n) + poly(k)) time per item,
with error at most 1/8.
Here, we also prove that any randomized one-pass algorithm that makes error
at most k/n requires at least Omega(k log(n/k)) space to accept strings which
are exactly k-away from strings in 1-turn-Dyck2 and to reject strings which are
exactly (k+2)-away from strings in 1-turn-Dyck2. Since 1-turn-Dyck2 and the
Hamming Distance problem are closely related we also obtain new upper and lower
bounds for this problem.
Dyck2 with errors: We prove that there exists a randomized one-pass algorithm
that given x checks whether there exists a string x' in Dyck2 such that x is
obtained from x' by changing (in some restricted manner) at most k positions
using:
- O(k log n + sqrt(n log n)) space, O(k log n) randomness, poly(k log n) time
per element and with error at most 1/poly(n). - O(k^(1+epsilon)+ sqrt(n log n))
space for every 0 <= epsilon <= 1, O(log n) randomness, O(polylog(n) + poly(k))
time per element, with error at most 1/8
Efficiently Computing Edit Distance to Dyck Language
Given a string over alphabet and a grammar defined over
the same alphabet, how many minimum number of repairs: insertions, deletions
and substitutions are required to map into a valid member of ? We
investigate this basic question in this paper for . is a
fundamental context free grammar representing the language of well-balanced
parentheses with s different types of parentheses and has played a pivotal role
in the development of theory of context free languages. Computing edit distance
to significantly generalizes string edit distance problem and has
numerous applications ranging from repairing semi-structured documents such as
XML to memory checking, automated compiler optimization, natural language
processing etc.
In this paper we give the first near-linear time algorithm for edit distance
computation to that achieves a nontrivial approximation factor of
in
time. In fact, given there exists an algorithm for
computing string edit distance on input of size in time with
-approximation factor, we can devise an algorithm for edit distance
problem to running in and
achieving an approximation factor of .
We show that the framework for efficiently approximating edit distance to
can be applied to many other languages. We illustrate this by
considering various memory checking languages which comprise of valid
transcripts of stacks, queues, priority queues, double-ended queues etc.
Therefore, any language that can be recognized by these data structures, can
also be repaired efficiently by our algorithm.Comment: 29 page
Language Edit Distance & Maximum Likelihood Parsing of Stochastic Grammars: Faster Algorithms & Connection to Fundamental Graph Problems
Given a context free language over alphabet and a string , the language edit distance (Lan-ED) problem seeks the minimum
number of edits (insertions, deletions and substitutions) required to convert
into a valid member of . The well-known dynamic programming algorithm
solves this problem in time (ignoring grammar size) where is the
string length [Aho, Peterson 1972, Myers 1985]. Despite its vast number of
applications, there is no algorithm known till date that computes or
approximates Lan-ED in true sub-cubic time.
In this paper we give the first such algorithm that computes Lan-ED almost
optimally. For any arbitrary , our algorithm runs in
time and returns an estimate
within a multiplicative approximation factor of , where
is the exponent of ordinary matrix multiplication of dimensional square
matrices. It also computes the edit script. Further, for all substrings of ,
we can estimate their Lan-ED within factor in
time with high probability. We
also design the very first sub-cubic () algorithm to
handle arbitrary stochastic context free grammar (SCFG) parsing. SCFGs lie at
the foundation of statistical natural language processing, they generalize
hidden Markov models, and have found widespread applications.
To complement our upper bound result, we show that exact computation of SCFG
parsing, or Lan-ED with insertion as only edit operation in true sub-cubic time
will imply a truly sub-cubic algorithm for all-pairs shortest paths, and hence
to a large range of problems in graphs and matrices. Known lower bound results
on parsing implies no improvement over our time bound of is
possible for any nontrivial multiplicative approximation.Comment: 36 pages: This is an updated version of the previous submission
"Faster Language Edit Distance, Connection to All-pairs Shortest Paths and
Related Problems". Introduction is rewritten, an error in a previous lower
bound proof corrected, and the Sidon sequence construction is elaborate