1,506 research outputs found

    Approximate Hamming distance in a stream

    Get PDF
    We consider the problem of computing a (1+ϵ)(1+\epsilon)-approximation of the Hamming distance between a pattern of length nn and successive substrings of a stream. We first look at the one-way randomised communication complexity of this problem, giving Alice the first half of the stream and Bob the second half. We show the following: (1) If Alice and Bob both share the pattern then there is an O(ϵ4log2n)O(\epsilon^{-4} \log^2 n) bit randomised one-way communication protocol. (2) If only Alice has the pattern then there is an O(ϵ2nlogn)O(\epsilon^{-2}\sqrt{n}\log n) bit randomised one-way communication protocol. We then go on to develop small space streaming algorithms for (1+ϵ)(1+\epsilon)-approximate Hamming distance which give worst case running time guarantees per arriving symbol. (1) For binary input alphabets there is an O(ϵ3nlog2n)O(\epsilon^{-3} \sqrt{n} \log^{2} n) space and O(ϵ2logn)O(\epsilon^{-2} \log{n}) time streaming (1+ϵ)(1+\epsilon)-approximate Hamming distance algorithm. (2) For general input alphabets there is an O(ϵ5nlog4n)O(\epsilon^{-5} \sqrt{n} \log^{4} n) space and O(ϵ4log3n)O(\epsilon^{-4} \log^3 {n}) time streaming (1+ϵ)(1+\epsilon)-approximate Hamming distance algorithm.Comment: Submitted to ICALP' 201

    Massively Parallel Algorithms for Distance Approximation and Spanners

    Full text link
    Over the past decade, there has been increasing interest in distributed/parallel algorithms for processing large-scale graphs. By now, we have quite fast algorithms -- usually sublogarithmic-time and often poly(loglogn)poly(\log\log n)-time, or even faster -- for a number of fundamental graph problems in the massively parallel computation (MPC) model. This model is a widely-adopted theoretical abstraction of MapReduce style settings, where a number of machines communicate in an all-to-all manner to process large-scale data. Contributing to this line of work on MPC graph algorithms, we present poly(logk)poly(loglogn)poly(\log k) \in poly(\log\log n) round MPC algorithms for computing O(k1+o(1))O(k^{1+{o(1)}})-spanners in the strongly sublinear regime of local memory. To the best of our knowledge, these are the first sublogarithmic-time MPC algorithms for spanner construction. As primary applications of our spanners, we get two important implications, as follows: -For the MPC setting, we get an O(log2logn)O(\log^2\log n)-round algorithm for O(log1+o(1)n)O(\log^{1+o(1)} n) approximation of all pairs shortest paths (APSP) in the near-linear regime of local memory. To the best of our knowledge, this is the first sublogarithmic-time MPC algorithm for distance approximations. -Our result above also extends to the Congested Clique model of distributed computing, with the same round complexity and approximation guarantee. This gives the first sub-logarithmic algorithm for approximating APSP in weighted graphs in the Congested Clique model

    Approximate Sparse Recovery: Optimizing Time and Measurements

    Full text link
    An approximate sparse recovery system consists of parameters k,Nk,N, an mm-by-NN measurement matrix, Φ\Phi, and a decoding algorithm, D\mathcal{D}. Given a vector, xx, the system approximates xx by x^=D(Φx)\widehat x =\mathcal{D}(\Phi x), which must satisfy x^x2Cxxk2\| \widehat x - x\|_2\le C \|x - x_k\|_2, where xkx_k denotes the optimal kk-term approximation to xx. For each vector xx, the system must succeed with probability at least 3/4. Among the goals in designing such systems are minimizing the number mm of measurements and the runtime of the decoding algorithm, D\mathcal{D}. In this paper, we give a system with m=O(klog(N/k))m=O(k \log(N/k)) measurements--matching a lower bound, up to a constant factor--and decoding time O(klogcN)O(k\log^c N), matching a lower bound up to log(N)\log(N) factors. We also consider the encode time (i.e., the time to multiply Φ\Phi by xx), the time to update measurements (i.e., the time to multiply Φ\Phi by a 1-sparse xx), and the robustness and stability of the algorithm (adding noise before and after the measurements). Our encode and update times are optimal up to log(N)\log(N) factors

    Non-Local Probes Do Not Help with Graph Problems

    Full text link
    This work bridges the gap between distributed and centralised models of computing in the context of sublinear-time graph algorithms. A priori, typical centralised models of computing (e.g., parallel decision trees or centralised local algorithms) seem to be much more powerful than distributed message-passing algorithms: centralised algorithms can directly probe any part of the input, while in distributed algorithms nodes can only communicate with their immediate neighbours. We show that for a large class of graph problems, this extra freedom does not help centralised algorithms at all: for example, efficient stateless deterministic centralised local algorithms can be simulated with efficient distributed message-passing algorithms. In particular, this enables us to transfer existing lower bound results from distributed algorithms to centralised local algorithms

    Distributed PCP Theorems for Hardness of Approximation in P

    Get PDF
    We present a new distributed model of probabilistically checkable proofs (PCP). A satisfying assignment x{0,1}nx \in \{0,1\}^n to a CNF formula φ\varphi is shared between two parties, where Alice knows x1,,xn/2x_1, \dots, x_{n/2}, Bob knows xn/2+1,,xnx_{n/2+1},\dots,x_n, and both parties know φ\varphi. The goal is to have Alice and Bob jointly write a PCP that xx satisfies φ\varphi, while exchanging little or no information. Unfortunately, this model as-is does not allow for nontrivial query complexity. Instead, we focus on a non-deterministic variant, where the players are helped by Merlin, a third party who knows all of xx. Using our framework, we obtain, for the first time, PCP-like reductions from the Strong Exponential Time Hypothesis (SETH) to approximation problems in P. In particular, under SETH we show that there are no truly-subquadratic approximation algorithms for Bichromatic Maximum Inner Product over {0,1}-vectors, Bichromatic LCS Closest Pair over permutations, Approximate Regular Expression Matching, and Diameter in Product Metric. All our inapproximability factors are nearly-tight. In particular, for the first two problems we obtain nearly-polynomial factors of 2(logn)1o(1)2^{(\log n)^{1-o(1)}}; only (1+o(1))(1+o(1))-factor lower bounds (under SETH) were known before

    When Hashing Met Matching: Efficient Spatio-Temporal Search for Ridesharing

    Full text link
    Carpooling, or sharing a ride with other passengers, holds immense potential for urban transportation. Ridesharing platforms enable such sharing of rides using real-time data. Finding ride matches in real-time at urban scale is a difficult combinatorial optimization task and mostly heuristic approaches are applied. In this work, we mathematically model the problem as that of finding near-neighbors and devise a novel efficient spatio-temporal search algorithm based on the theory of locality sensitive hashing for Maximum Inner Product Search (MIPS). The proposed algorithm can find kk near-optimal potential matches for every ride from a pool of nn rides in time O(n1+ρ(k+logn)logk)O(n^{1 + \rho} (k + \log n) \log k) and space O(n1+ρlogk)O(n^{1 + \rho} \log k) for a small ρ<1\rho < 1. Our algorithm can be extended in several useful and interesting ways increasing its practical appeal. Experiments with large NY yellow taxi trip datasets show that our algorithm consistently outperforms state-of-the-art heuristic methods thereby proving its practical applicability
    corecore