1,506 research outputs found
Approximate Hamming distance in a stream
We consider the problem of computing a -approximation of the
Hamming distance between a pattern of length and successive substrings of a
stream. We first look at the one-way randomised communication complexity of
this problem, giving Alice the first half of the stream and Bob the second
half. We show the following: (1) If Alice and Bob both share the pattern then
there is an bit randomised one-way communication
protocol. (2) If only Alice has the pattern then there is an
bit randomised one-way communication protocol.
We then go on to develop small space streaming algorithms for
-approximate Hamming distance which give worst case running time
guarantees per arriving symbol. (1) For binary input alphabets there is an
space and
time streaming -approximate Hamming distance algorithm. (2) For
general input alphabets there is an
space and time streaming
-approximate Hamming distance algorithm.Comment: Submitted to ICALP' 201
Massively Parallel Algorithms for Distance Approximation and Spanners
Over the past decade, there has been increasing interest in
distributed/parallel algorithms for processing large-scale graphs. By now, we
have quite fast algorithms -- usually sublogarithmic-time and often
-time, or even faster -- for a number of fundamental graph
problems in the massively parallel computation (MPC) model. This model is a
widely-adopted theoretical abstraction of MapReduce style settings, where a
number of machines communicate in an all-to-all manner to process large-scale
data. Contributing to this line of work on MPC graph algorithms, we present
round MPC algorithms for computing
-spanners in the strongly sublinear regime of local memory. To
the best of our knowledge, these are the first sublogarithmic-time MPC
algorithms for spanner construction. As primary applications of our spanners,
we get two important implications, as follows:
-For the MPC setting, we get an -round algorithm for
approximation of all pairs shortest paths (APSP) in the
near-linear regime of local memory. To the best of our knowledge, this is the
first sublogarithmic-time MPC algorithm for distance approximations.
-Our result above also extends to the Congested Clique model of distributed
computing, with the same round complexity and approximation guarantee. This
gives the first sub-logarithmic algorithm for approximating APSP in weighted
graphs in the Congested Clique model
Approximate Sparse Recovery: Optimizing Time and Measurements
An approximate sparse recovery system consists of parameters , an
-by- measurement matrix, , and a decoding algorithm, .
Given a vector, , the system approximates by , which must satisfy , where denotes the optimal -term approximation to . For
each vector , the system must succeed with probability at least 3/4. Among
the goals in designing such systems are minimizing the number of
measurements and the runtime of the decoding algorithm, .
In this paper, we give a system with
measurements--matching a lower bound, up to a constant factor--and decoding
time , matching a lower bound up to factors.
We also consider the encode time (i.e., the time to multiply by ),
the time to update measurements (i.e., the time to multiply by a
1-sparse ), and the robustness and stability of the algorithm (adding noise
before and after the measurements). Our encode and update times are optimal up
to factors
Non-Local Probes Do Not Help with Graph Problems
This work bridges the gap between distributed and centralised models of
computing in the context of sublinear-time graph algorithms. A priori, typical
centralised models of computing (e.g., parallel decision trees or centralised
local algorithms) seem to be much more powerful than distributed
message-passing algorithms: centralised algorithms can directly probe any part
of the input, while in distributed algorithms nodes can only communicate with
their immediate neighbours. We show that for a large class of graph problems,
this extra freedom does not help centralised algorithms at all: for example,
efficient stateless deterministic centralised local algorithms can be simulated
with efficient distributed message-passing algorithms. In particular, this
enables us to transfer existing lower bound results from distributed algorithms
to centralised local algorithms
Distributed PCP Theorems for Hardness of Approximation in P
We present a new distributed model of probabilistically checkable proofs
(PCP). A satisfying assignment to a CNF formula is
shared between two parties, where Alice knows , Bob knows
, and both parties know . The goal is to have
Alice and Bob jointly write a PCP that satisfies , while
exchanging little or no information. Unfortunately, this model as-is does not
allow for nontrivial query complexity. Instead, we focus on a non-deterministic
variant, where the players are helped by Merlin, a third party who knows all of
.
Using our framework, we obtain, for the first time, PCP-like reductions from
the Strong Exponential Time Hypothesis (SETH) to approximation problems in P.
In particular, under SETH we show that there are no truly-subquadratic
approximation algorithms for Bichromatic Maximum Inner Product over
{0,1}-vectors, Bichromatic LCS Closest Pair over permutations, Approximate
Regular Expression Matching, and Diameter in Product Metric. All our
inapproximability factors are nearly-tight. In particular, for the first two
problems we obtain nearly-polynomial factors of ; only
-factor lower bounds (under SETH) were known before
When Hashing Met Matching: Efficient Spatio-Temporal Search for Ridesharing
Carpooling, or sharing a ride with other passengers, holds immense potential
for urban transportation. Ridesharing platforms enable such sharing of rides
using real-time data. Finding ride matches in real-time at urban scale is a
difficult combinatorial optimization task and mostly heuristic approaches are
applied. In this work, we mathematically model the problem as that of finding
near-neighbors and devise a novel efficient spatio-temporal search algorithm
based on the theory of locality sensitive hashing for Maximum Inner Product
Search (MIPS). The proposed algorithm can find near-optimal potential
matches for every ride from a pool of rides in time and space for a small . Our
algorithm can be extended in several useful and interesting ways increasing its
practical appeal. Experiments with large NY yellow taxi trip datasets show that
our algorithm consistently outperforms state-of-the-art heuristic methods
thereby proving its practical applicability
- …