661 research outputs found

    Approximating solution structure of the Weighted Sentence Alignment problem

    Full text link
    We study the complexity of approximating solution structure of the bijective weighted sentence alignment problem of DeNero and Klein (2008). In particular, we consider the complexity of finding an alignment that has a significant overlap with an optimal alignment. We discuss ways of representing the solution for the general weighted sentence alignment as well as phrases-to-words alignment problem, and show that computing a string which agrees with the optimal sentence partition on more than half (plus an arbitrarily small polynomial fraction) positions for the phrases-to-words alignment is NP-hard. For the general weighted sentence alignment we obtain such bound from the agreement on a little over 2/3 of the bits. Additionally, we generalize the Hamming distance approximation of a solution structure to approximating it with respect to the edit distance metric, obtaining similar lower bounds

    Edit Distance: Sketching, Streaming and Document Exchange

    Full text link
    We show that in the document exchange problem, where Alice holds x∈{0,1}nx \in \{0,1\}^n and Bob holds y∈{0,1}ny \in \{0,1\}^n, Alice can send Bob a message of size O(K(log⁑2K+log⁑n))O(K(\log^2 K+\log n)) bits such that Bob can recover xx using the message and his input yy if the edit distance between xx and yy is no more than KK, and output "error" otherwise. Both the encoding and decoding can be done in time O~(n+poly(K))\tilde{O}(n+\mathsf{poly}(K)). This result significantly improves the previous communication bounds under polynomial encoding/decoding time. We also show that in the referee model, where Alice and Bob hold xx and yy respectively, they can compute sketches of xx and yy of sizes poly(Klog⁑n)\mathsf{poly}(K \log n) bits (the encoding), and send to the referee, who can then compute the edit distance between xx and yy together with all the edit operations if the edit distance is no more than KK, and output "error" otherwise (the decoding). To the best of our knowledge, this is the first result for sketching edit distance using poly(Klog⁑n)\mathsf{poly}(K \log n) bits. Moreover, the encoding phase of our sketching algorithm can be performed by scanning the input string in one pass. Thus our sketching algorithm also implies the first streaming algorithm for computing edit distance and all the edits exactly using poly(Klog⁑n)\mathsf{poly}(K \log n) bits of space.Comment: Full version of an article to be presented at the 57th Annual IEEE Symposium on Foundations of Computer Science (FOCS 2016
    • …
    corecore