34 research outputs found
Probabilistic Polynomials and Hamming Nearest Neighbors
We show how to compute any symmetric Boolean function on variables over
any field (as well as the integers) with a probabilistic polynomial of degree
and error at most . The degree
dependence on and is optimal, matching a lower bound of Razborov
(1987) and Smolensky (1987) for the MAJORITY function. The proof is
constructive: a low-degree polynomial can be efficiently sampled from the
distribution.
This polynomial construction is combined with other algebraic ideas to give
the first subquadratic time algorithm for computing a (worst-case) batch of
Hamming distances in superlogarithmic dimensions, exactly. To illustrate, let
. Suppose we are given a database
of vectors in and a collection of query vectors
in the same dimension. For all , we wish to compute a
with minimum Hamming distance from . We solve this problem in randomized time. Hence, the problem is in "truly subquadratic"
time for dimensions, and in subquadratic time for . We apply the algorithm to computing pairs with maximum
inner product, closest pair in for vectors with bounded integer
entries, and pairs with maximum Jaccard coefficients.Comment: 16 pages. To appear in 56th Annual IEEE Symposium on Foundations of
Computer Science (FOCS 2015
Hybrid LSH: Faster Near Neighbors Reporting in High-dimensional Space
We study the -near neighbors reporting problem (-NN), i.e., reporting
\emph{all} points in a high-dimensional point set that lie within a radius
of a given query point . Our approach builds upon on the
locality-sensitive hashing (LSH) framework due to its appealing asymptotic
sublinear query time for near neighbor search problems in high-dimensional
space. A bottleneck of the traditional LSH scheme for solving -NN is that
its performance is sensitive to data and query-dependent parameters. On
datasets whose data distributions have diverse local density patterns, LSH with
inappropriate tuning parameters can sometimes be outperformed by a simple
linear search.
In this paper, we introduce a hybrid search strategy between LSH-based search
and linear search for -NN in high-dimensional space. By integrating an
auxiliary data structure into LSH hash tables, we can efficiently estimate the
computational cost of LSH-based search for a given query regardless of the data
distribution. This means that we are able to choose the appropriate search
strategy between LSH-based search and linear search to achieve better
performance. Moreover, the integrated data structure is time efficient and fits
well with many recent state-of-the-art LSH-based approaches. Our experiments on
real-world datasets show that the hybrid search approach outperforms (or is
comparable to) both LSH-based search and linear search for a wide range of
search radii and data distributions in high-dimensional space.Comment: Accepted as a short paper in EDBT 201
Fine-Grained Complexity Theory: Conditional Lower Bounds for Computational Geometry
Fine-grained complexity theory is the area of theoretical computer sciencethat proves conditional lower bounds based on the Strong Exponential TimeHypothesis and similar conjectures. This area has been thriving in the lastdecade, leading to conditionally best-possible algorithms for a wide variety ofproblems on graphs, strings, numbers etc. This article is an introduction tofine-grained lower bounds in computational geometry, with a focus on lowerbounds for polynomial-time problems based on the Orthogonal Vectors Hypothesis.Specifically, we discuss conditional lower bounds for nearest neighbor searchunder the Euclidean distance and Fr\'echet distance.<br
Strong ETH Breaks With Merlin and Arthur: Short Non-Interactive Proofs of Batch Evaluation
We present an efficient proof system for Multipoint Arithmetic Circuit
Evaluation: for every arithmetic circuit of size and
degree over a field , and any inputs ,
the Prover sends the Verifier the values and a proof of length, and
the Verifier tosses coins and can check the proof in about time, with probability of error less than .
For small degree , this "Merlin-Arthur" proof system (a.k.a. MA-proof
system) runs in nearly-linear time, and has many applications. For example, we
obtain MA-proof systems that run in time (for various ) for the
Permanent, Circuit-SAT for all sublinear-depth circuits, counting
Hamiltonian cycles, and infeasibility of - linear programs. In general,
the value of any polynomial in Valiant's class can be certified
faster than "exhaustive summation" over all possible assignments. These results
strongly refute a Merlin-Arthur Strong ETH and Arthur-Merlin Strong ETH posed
by Russell Impagliazzo and others.
We also give a three-round (AMA) proof system for quantified Boolean formulas
running in time, nearly-linear time MA-proof systems for
counting orthogonal vectors in a collection and finding Closest Pairs in the
Hamming metric, and a MA-proof system running in -time for
counting -cliques in graphs.
We point to some potential future directions for refuting the
Nondeterministic Strong ETH.Comment: 17 page
Distributed PCP Theorems for Hardness of Approximation in P
We present a new distributed model of probabilistically checkable proofs
(PCP). A satisfying assignment to a CNF formula is
shared between two parties, where Alice knows , Bob knows
, and both parties know . The goal is to have
Alice and Bob jointly write a PCP that satisfies , while
exchanging little or no information. Unfortunately, this model as-is does not
allow for nontrivial query complexity. Instead, we focus on a non-deterministic
variant, where the players are helped by Merlin, a third party who knows all of
.
Using our framework, we obtain, for the first time, PCP-like reductions from
the Strong Exponential Time Hypothesis (SETH) to approximation problems in P.
In particular, under SETH we show that there are no truly-subquadratic
approximation algorithms for Bichromatic Maximum Inner Product over
{0,1}-vectors, Bichromatic LCS Closest Pair over permutations, Approximate
Regular Expression Matching, and Diameter in Product Metric. All our
inapproximability factors are nearly-tight. In particular, for the first two
problems we obtain nearly-polynomial factors of ; only
-factor lower bounds (under SETH) were known before