3 research outputs found

    Edit Distance in Near-Linear Time: it's a Constant Factor

    Full text link
    We present an algorithm for approximating the edit distance between two strings of length nn in time n1+ϵn^{1+\epsilon}, for any ϵ>0\epsilon>0, up to a constant factor. Our result completes the research direction set forth in the recent breakthrough paper [Chakraborty-Das-Goldenberg-Koucky-Saks, FOCS'18], which showed the first constant-factor approximation algorithm with a (strongly) sub-quadratic running time. Several recent results have shown near-linear complexity under different restrictions on the inputs (eg, when the edit distance is close to maximal, or when one of the inputs is pseudo-random). In contrast, our algorithm obtains a constant-factor approximation in near-linear running time for any input strings

    Two Party Distribution Testing: Communication and Security

    Get PDF
    We study the problem of discrete distribution testing in the two-party setting. For example, in the standard closeness testing problem, Alice and Bob each have t samples from, respectively, distributions a and b over [n], and they need to test whether a=b or a,b are epsilon-far (in the l_1 distance). This is in contrast to the well-studied one-party case, where the tester has unrestricted access to samples of both distributions. Despite being a natural constraint in applications, the two-party setting has previously evaded attention. We address two fundamental aspects of the two-party setting: 1) what is the communication complexity, and 2) can it be accomplished securely, without Alice and Bob learning extra information about each other\u27s input. Besides closeness testing, we also study the independence testing problem, where Alice and Bob have t samples from distributions a and b respectively, which may be correlated; the question is whether a,b are independent or epsilon-far from being independent. Our contribution is three-fold: 1) We show how to gain communication efficiency given more samples, beyond the information-theoretic bound on t. The gain is polynomially better than what one would obtain via adapting one-party algorithms. 2) We prove tightness of our trade-off for the closeness testing, as well as that the independence testing requires tight Omega(sqrt{m}) communication for unbounded number of samples. These lower bounds are of independent interest as, to the best of our knowledge, these are the first 2-party communication lower bounds for testing problems, where the inputs are a set of i.i.d. samples. 3) We define the concept of secure distribution testing, and provide secure versions of the above protocols with an overhead that is only polynomial in the security parameter

    Estimating the Longest Increasing Subsequence in Nearly Optimal Time

    Full text link
    Longest Increasing Subsequence (LIS) is a fundamental statistic of a sequence, and has been studied for decades. While the LIS of a sequence of length nn can be computed exactly in time O(nlogn)O(n\log n), the complexity of estimating the (length of the) LIS in sublinear time, especially when LIS n\ll n, is still open. We show that for any integer nn and any λ=o(1)\lambda = o(1), there exists a (randomized) non-adaptive algorithm that, given a sequence of length nn with LIS λn\ge \lambda n, approximates the LIS up to a factor of 1/λo(1)1/\lambda^{o(1)} in no(1)/λn^{o(1)} / \lambda time. Our algorithm improves upon prior work substantially in terms of both approximation and run-time: (i) we provide the first sub-polynomial approximation for LIS in sub-linear time; and (ii) our run-time complexity essentially matches the trivial sample complexity lower bound of Ω(1/λ)\Omega(1/\lambda), which is required to obtain any non-trivial approximation of the LIS. As part of our solution, we develop two novel ideas which may be of independent interest: First, we define a new Genuine-LIS problem, where each sequence element may either be genuine or corrupted. In this model, the user receives unrestricted access to actual sequence, but does not know apriori which elements are genuine. The goal is to estimate the LIS using genuine elements only, with the minimal number of "genuiness tests". The second idea, Precision Forest, enables accurate estimations for composition of general functions from "coarse" (sub-)estimates. Precision Forest essentially generalizes classical precision sampling, which works only for summations. As a central tool, the Precision Forest is initially pre-processed on a set of samples, which thereafter is repeatedly reused by multiple sub-parts of the algorithm, improving their amortized complexity.Comment: Full version of FOCS 2022 pape
    corecore