3 research outputs found

    Streaming algorithms for recognizing nearly well-parenthesized expressions

    Full text link
    We study the streaming complexity of the membership problem of 1-turn-Dyck2 and Dyck2 when there are a few errors in the input string. 1-turn-Dyck2 with errors: We prove that there exists a randomized one-pass algorithm that given x checks whether there exists a string x' in 1-turn-Dyck2 such that x is obtained by flipping at most kk locations of x' using: - O(k log n) space, O(k log n) randomness, and poly(k log n) time per item and with error at most 1/poly(n). - O(k^{1+epsilon} + log n) space for every 0 <= epsilon <= 1, O(log n) randomness, O(polylog(n) + poly(k)) time per item, with error at most 1/8. Here, we also prove that any randomized one-pass algorithm that makes error at most k/n requires at least Omega(k log(n/k)) space to accept strings which are exactly k-away from strings in 1-turn-Dyck2 and to reject strings which are exactly (k+2)-away from strings in 1-turn-Dyck2. Since 1-turn-Dyck2 and the Hamming Distance problem are closely related we also obtain new upper and lower bounds for this problem. Dyck2 with errors: We prove that there exists a randomized one-pass algorithm that given x checks whether there exists a string x' in Dyck2 such that x is obtained from x' by changing (in some restricted manner) at most k positions using: - O(k log n + sqrt(n log n)) space, O(k log n) randomness, poly(k log n) time per element and with error at most 1/poly(n). - O(k^(1+epsilon)+ sqrt(n log n)) space for every 0 <= epsilon <= 1, O(log n) randomness, O(polylog(n) + poly(k)) time per element, with error at most 1/8

    Efficiently Computing Edit Distance to Dyck Language

    Full text link
    Given a string σ\sigma over alphabet Σ\Sigma and a grammar GG defined over the same alphabet, how many minimum number of repairs: insertions, deletions and substitutions are required to map σ\sigma into a valid member of GG ? We investigate this basic question in this paper for Dyck(s)Dyck(s). Dyck(s)Dyck(s) is a fundamental context free grammar representing the language of well-balanced parentheses with s different types of parentheses and has played a pivotal role in the development of theory of context free languages. Computing edit distance to Dyck(s)Dyck(s) significantly generalizes string edit distance problem and has numerous applications ranging from repairing semi-structured documents such as XML to memory checking, automated compiler optimization, natural language processing etc. In this paper we give the first near-linear time algorithm for edit distance computation to Dyck(s)Dyck(s) that achieves a nontrivial approximation factor of O(1ϵlogOPT(logn)1ϵ)O(\frac{1}{\epsilon}\log{OPT}(\log{n})^{\frac{1}{\epsilon}}) in O(n1+ϵlogn)O(n^{1+\epsilon}\log{n}) time. In fact, given there exists an algorithm for computing string edit distance on input of size nn in α(n)\alpha(n) time with β(n)\beta(n)-approximation factor, we can devise an algorithm for edit distance problem to Dyck(s)Dyck(s) running in O~(n1+ϵ+α(n))\tilde{O}(n^{1+\epsilon}+\alpha(n)) and achieving an approximation factor of O(1ϵβ(n)logOPT)O(\frac{1}{\epsilon}\beta(n)\log{OPT}). We show that the framework for efficiently approximating edit distance to Dyck(s)Dyck(s) can be applied to many other languages. We illustrate this by considering various memory checking languages which comprise of valid transcripts of stacks, queues, priority queues, double-ended queues etc. Therefore, any language that can be recognized by these data structures, can also be repaired efficiently by our algorithm.Comment: 29 page

    Language Edit Distance & Maximum Likelihood Parsing of Stochastic Grammars: Faster Algorithms & Connection to Fundamental Graph Problems

    Full text link
    Given a context free language L(G)L(G) over alphabet Σ\Sigma and a string sΣs \in \Sigma^*, the language edit distance (Lan-ED) problem seeks the minimum number of edits (insertions, deletions and substitutions) required to convert ss into a valid member of L(G)L(G). The well-known dynamic programming algorithm solves this problem in O(n3)O(n^3) time (ignoring grammar size) where nn is the string length [Aho, Peterson 1972, Myers 1985]. Despite its vast number of applications, there is no algorithm known till date that computes or approximates Lan-ED in true sub-cubic time. In this paper we give the first such algorithm that computes Lan-ED almost optimally. For any arbitrary ϵ>0\epsilon > 0, our algorithm runs in O~(nωpoly(ϵ))\tilde{O}(\frac{n^{\omega}}{poly(\epsilon)}) time and returns an estimate within a multiplicative approximation factor of (1+ϵ)(1+\epsilon), where ω\omega is the exponent of ordinary matrix multiplication of nn dimensional square matrices. It also computes the edit script. Further, for all substrings of ss, we can estimate their Lan-ED within (1±ϵ)(1\pm \epsilon) factor in O~(nωpoly(ϵ))\tilde{O}(\frac{n^{\omega}}{poly(\epsilon)}) time with high probability. We also design the very first sub-cubic (O~(nω)\tilde{O}(n^\omega)) algorithm to handle arbitrary stochastic context free grammar (SCFG) parsing. SCFGs lie at the foundation of statistical natural language processing, they generalize hidden Markov models, and have found widespread applications. To complement our upper bound result, we show that exact computation of SCFG parsing, or Lan-ED with insertion as only edit operation in true sub-cubic time will imply a truly sub-cubic algorithm for all-pairs shortest paths, and hence to a large range of problems in graphs and matrices. Known lower bound results on parsing implies no improvement over our time bound of O(nω)O(n^\omega) is possible for any nontrivial multiplicative approximation.Comment: 36 pages: This is an updated version of the previous submission "Faster Language Edit Distance, Connection to All-pairs Shortest Paths and Related Problems". Introduction is rewritten, an error in a previous lower bound proof corrected, and the Sidon sequence construction is elaborate
    corecore