    A Simple Algorithm for Approximating the Text-To-Pattern Hamming Distance

    The algorithmic task of computing the Hamming distance between a given pattern of length m and each location in a text of length n, both over a general alphabet Sigma, is one of the most fundamental algorithmic tasks in string algorithms. The fastest known runtime for exact computation is tilde O(nsqrt m). We recently introduced a complicated randomized algorithm for obtaining a (1 +/- eps) approximation for each location in the text in O( (n/eps) log(1/eps) log n log m log |Sigma|) total time, breaking a barrier that stood for 22 years. In this paper, we introduce an elementary and simple randomized algorithm that takes O((n/eps) log n log m) time

    Online Linear Extractors for Independent Sources

    In this work, we characterize online linear extractors. In other words, given a matrix A∈F2nΓ—nA \in \mathbb{F}_2^{n \times n}, we study the convergence of the iterated process S←ASβŠ•X\mathbf{S} \leftarrow A\mathbf{S} \oplus \mathbf{X} , where X∼D\mathbf{X} \sim D is repeatedly sampled independently from some fixed (but unknown) distribution DD with (min)-entropy at least kk. Here, we think of S∈{0,1}n\mathbf{S} \in \{0,1\}^n as the state of an online extractor, and X∈{0,1}n\mathbf{X} \in \{0,1\}^n as its input. As our main result, we show that the state S\mathbf{S} converges to the uniform distribution for all input distributions DD with entropy k>0k > 0 if and only if the matrix AA has no non-trivial invariant subspace (i.e., a non-zero subspace V⊊F2nV \subsetneq \mathbb{F}_2^n such that AVβŠ†VAV \subseteq V). In other words, a matrix AA yields an online linear extractor if and only if AA has no non-trivial invariant subspace. For example, the linear transformation corresponding to multiplication by a generator of the field F2n\mathbb{F}_{2^n} yields a good online linear extractor. Furthermore, for any such matrix convergence takes at most O~(n2(k+1)/k2)\widetilde{O}(n^2(k+1)/k^2) steps. We also study the more general notion of condensing---that is, we ask when this process converges to a distribution with entropy at least β„“\ell, when the input distribution has entropy greater than kk. (Extractors corresponding to the special case when β„“=n\ell = n.) We show that a matrix gives a good condenser if there are relatively few vectors w∈F2n\mathbf{w} \in \mathbb{F}_2^n such that w,ATw,…,(AT)nβˆ’kβˆ’1w\mathbf{w}, A^T\mathbf{w}, \ldots, (A^T)^{n-k-1} \mathbf{w} are linearly dependent. As an application, we show that the very simple cyclic rotation transformation A(x1,…,xn)=(xn,x1,…,xnβˆ’1)A(x_1,\ldots, x_n) = (x_n,x_1,\ldots, x_{n-1}) condenses to β„“=nβˆ’1\ell = n-1 bits for any k>1k > 1 if nn is a prime satisfying a certain simple number-theoretic condition. Our proofs are Fourier-analytic and rely on a novel lemma, which gives a tight bound on the product of certain Fourier coefficients of any entropic distribution

    Reliable Hubs for Partially-Dynamic All-Pairs Shortest Paths in Directed Graphs

    We give new partially-dynamic algorithms for the all-pairs shortest paths problem in weighted directed graphs. Most importantly, we give a new deterministic incremental algorithm for the problem that handles updates in O~(mn^(4/3) log{W}/epsilon) total time (where the edge weights are from [1,W]) and explicitly maintains a (1+epsilon)-approximate distance matrix. For a fixed epsilon>0, this is the first deterministic partially dynamic algorithm for all-pairs shortest paths in directed graphs, whose update time is o(n^2) regardless of the number of edges. Furthermore, we also show how to improve the state-of-the-art partially dynamic randomized algorithms for all-pairs shortest paths [Baswana et al. STOC\u2702, Bernstein STOC\u2713] from Monte Carlo randomized to Las Vegas randomized without increasing the running time bounds (with respect to the O~(*) notation). Our results are obtained by giving new algorithms for the problem of dynamically maintaining hubs, that is a set of O~(n/d) vertices which hit a shortest path between each pair of vertices, provided it has hop-length Omega(d). We give new subquadratic deterministic and Las Vegas algorithms for maintenance of hubs under either edge insertions or deletions

    Vertex Sparsification for Edge Connectivity in Polynomial Time

    Improved Approximation for Longest Common Subsequence over Small Alphabets

    This paper investigates the approximability of the Longest Common Subsequence (LCS) problem. The fastest algorithm for solving the LCS problem exactly runs in essentially quadratic time in the length of the input, and it is known that under the Strong Exponential Time Hypothesis the quadratic running time cannot be beaten. There are no such limitations for the approximate computation of the LCS however, except in some limited scenarios. There is also a scarcity of approximation algorithms. When the two given strings are over an alphabet of size k, returning the subsequence formed by the most frequent symbol occurring in both strings achieves a 1/k approximation for the LCS. It is an open problem whether a better than 1/k approximation can be achieved in truly subquadratic time (O(n^{2-?}) time for constant ? > 0). A recent result [Rubinstein and Song SODA\u272020] showed that a 1/2+? approximation for the LCS over a binary alphabet is possible in truly subquadratic time, provided the input strings have the same length. In this paper we show that if a 1/2+? approximation (for ? > 0) is achievable for binary LCS in truly subquadratic time when the input strings can be unequal, then for every constant k, there is a truly subquadratic time algorithm that achieves a 1/k+? approximation for k-ary alphabet LCS for some ? > 0. Thus the binary case is the hardest. We also show that for every constant k, if one is given two strings of equal length over a k-ary alphabet, one can obtain a 1/k+? approximation for some constant ? > 0 in truly subquadratic time, thus extending the Rubinstein and Song result to all alphabets of constant size