4,379 research outputs found
Hardness of Approximate Nearest Neighbor Search
We prove conditional near-quadratic running time lower bounds for approximate
Bichromatic Closest Pair with Euclidean, Manhattan, Hamming, or edit distance.
Specifically, unless the Strong Exponential Time Hypothesis (SETH) is false,
for every there exists a constant such that computing a
-approximation to the Bichromatic Closest Pair requires
time. In particular, this implies a near-linear query time for
Approximate Nearest Neighbor search with polynomial preprocessing time.
Our reduction uses the Distributed PCP framework of [ARW'17], but obtains
improved efficiency using Algebraic Geometry (AG) codes. Efficient PCPs from AG
codes have been constructed in other settings before [BKKMS'16, BCGRS'17], but
our construction is the first to yield new hardness results
Distributed PCP Theorems for Hardness of Approximation in P
We present a new distributed model of probabilistically checkable proofs
(PCP). A satisfying assignment to a CNF formula is
shared between two parties, where Alice knows , Bob knows
, and both parties know . The goal is to have
Alice and Bob jointly write a PCP that satisfies , while
exchanging little or no information. Unfortunately, this model as-is does not
allow for nontrivial query complexity. Instead, we focus on a non-deterministic
variant, where the players are helped by Merlin, a third party who knows all of
.
Using our framework, we obtain, for the first time, PCP-like reductions from
the Strong Exponential Time Hypothesis (SETH) to approximation problems in P.
In particular, under SETH we show that there are no truly-subquadratic
approximation algorithms for Bichromatic Maximum Inner Product over
{0,1}-vectors, Bichromatic LCS Closest Pair over permutations, Approximate
Regular Expression Matching, and Diameter in Product Metric. All our
inapproximability factors are nearly-tight. In particular, for the first two
problems we obtain nearly-polynomial factors of ; only
-factor lower bounds (under SETH) were known before
Lower Bounds for Oblivious Near-Neighbor Search
We prove an lower bound on the dynamic
cell-probe complexity of statistically
approximate-near-neighbor search () over the -dimensional
Hamming cube. For the natural setting of , our result
implies an lower bound, which is a quadratic
improvement over the highest (non-oblivious) cell-probe lower bound for
. This is the first super-logarithmic
lower bound for against general (non black-box) data structures.
We also show that any oblivious data structure for
decomposable search problems (like ) can be obliviously dynamized
with overhead in update and query time, strengthening a classic
result of Bentley and Saxe (Algorithmica, 1980).Comment: 28 page
SANNS: Scaling Up Secure Approximate k-Nearest Neighbors Search
The -Nearest Neighbor Search (-NNS) is the backbone of several
cloud-based services such as recommender systems, face recognition, and
database search on text and images. In these services, the client sends the
query to the cloud server and receives the response in which case the query and
response are revealed to the service provider. Such data disclosures are
unacceptable in several scenarios due to the sensitivity of data and/or privacy
laws.
In this paper, we introduce SANNS, a system for secure -NNS that keeps
client's query and the search result confidential. SANNS comprises two
protocols: an optimized linear scan and a protocol based on a novel sublinear
time clustering-based algorithm. We prove the security of both protocols in the
standard semi-honest model. The protocols are built upon several
state-of-the-art cryptographic primitives such as lattice-based additively
homomorphic encryption, distributed oblivious RAM, and garbled circuits. We
provide several contributions to each of these primitives which are applicable
to other secure computation tasks. Both of our protocols rely on a new circuit
for the approximate top- selection from numbers that is built from comparators.
We have implemented our proposed system and performed extensive experimental
results on four datasets in two different computation environments,
demonstrating more than faster response time compared to
optimally implemented protocols from the prior work. Moreover, SANNS is the
first work that scales to the database of 10 million entries, pushing the limit
by more than two orders of magnitude.Comment: 18 pages, to appear at USENIX Security Symposium 202
- …