25 research outputs found
Hardness of Approximate Nearest Neighbor Search
We prove conditional near-quadratic running time lower bounds for approximate
Bichromatic Closest Pair with Euclidean, Manhattan, Hamming, or edit distance.
Specifically, unless the Strong Exponential Time Hypothesis (SETH) is false,
for every there exists a constant such that computing a
-approximation to the Bichromatic Closest Pair requires
time. In particular, this implies a near-linear query time for
Approximate Nearest Neighbor search with polynomial preprocessing time.
Our reduction uses the Distributed PCP framework of [ARW'17], but obtains
improved efficiency using Algebraic Geometry (AG) codes. Efficient PCPs from AG
codes have been constructed in other settings before [BKKMS'16, BCGRS'17], but
our construction is the first to yield new hardness results
Distributed PCP Theorems for Hardness of Approximation in P
We present a new distributed model of probabilistically checkable proofs
(PCP). A satisfying assignment to a CNF formula is
shared between two parties, where Alice knows , Bob knows
, and both parties know . The goal is to have
Alice and Bob jointly write a PCP that satisfies , while
exchanging little or no information. Unfortunately, this model as-is does not
allow for nontrivial query complexity. Instead, we focus on a non-deterministic
variant, where the players are helped by Merlin, a third party who knows all of
.
Using our framework, we obtain, for the first time, PCP-like reductions from
the Strong Exponential Time Hypothesis (SETH) to approximation problems in P.
In particular, under SETH we show that there are no truly-subquadratic
approximation algorithms for Bichromatic Maximum Inner Product over
{0,1}-vectors, Bichromatic LCS Closest Pair over permutations, Approximate
Regular Expression Matching, and Diameter in Product Metric. All our
inapproximability factors are nearly-tight. In particular, for the first two
problems we obtain nearly-polynomial factors of ; only
-factor lower bounds (under SETH) were known before
On Deterministic Sketching and Streaming for Sparse Recovery and Norm Estimation
We study classic streaming and sparse recovery problems using deterministic
linear sketches, including l1/l1 and linf/l1 sparse recovery problems (the
latter also being known as l1-heavy hitters), norm estimation, and approximate
inner product. We focus on devising a fixed matrix A in R^{m x n} and a
deterministic recovery/estimation procedure which work for all possible input
vectors simultaneously. Our results improve upon existing work, the following
being our main contributions:
* A proof that linf/l1 sparse recovery and inner product estimation are
equivalent, and that incoherent matrices can be used to solve both problems.
Our upper bound for the number of measurements is m=O(eps^{-2}*min{log n, (log
n / log(1/eps))^2}). We can also obtain fast sketching and recovery algorithms
by making use of the Fast Johnson-Lindenstrauss transform. Both our running
times and number of measurements improve upon previous work. We can also obtain
better error guarantees than previous work in terms of a smaller tail of the
input vector.
* A new lower bound for the number of linear measurements required to solve
l1/l1 sparse recovery. We show Omega(k/eps^2 + klog(n/k)/eps) measurements are
required to recover an x' with |x - x'|_1 <= (1+eps)|x_{tail(k)}|_1, where
x_{tail(k)} is x projected onto all but its largest k coordinates in magnitude.
* A tight bound of m = Theta(eps^{-2}log(eps^2 n)) on the number of
measurements required to solve deterministic norm estimation, i.e., to recover
|x|_2 +/- eps|x|_1.
For all the problems we study, tight bounds are already known for the
randomized complexity from previous work, except in the case of l1/l1 sparse
recovery, where a nearly tight bound is known. Our work thus aims to study the
deterministic complexities of these problems
On metric Ramsey-type phenomena
The main question studied in this article may be viewed as a nonlinear
analogue of Dvoretzky's theorem in Banach space theory or as part of Ramsey
theory in combinatorics. Given a finite metric space on n points, we seek its
subspace of largest cardinality which can be embedded with a given distortion
in Hilbert space. We provide nearly tight upper and lower bounds on the
cardinality of this subspace in terms of n and the desired distortion. Our main
theorem states that for any epsilon>0, every n point metric space contains a
subset of size at least n^{1-\epsilon} which is embeddable in Hilbert space
with O(\frac{\log(1/\epsilon)}{\epsilon}) distortion. The bound on the
distortion is tight up to the log(1/\epsilon) factor. We further include a
comprehensive study of various other aspects of this problem.Comment: 67 pages, published versio