2,058 research outputs found
Optimal Hashing-based Time-Space Trade-offs for Approximate Near Neighbors
[See the paper for the full abstract.]
We show tight upper and lower bounds for time-space trade-offs for the
-Approximate Near Neighbor Search problem. For the -dimensional Euclidean
space and -point datasets, we develop a data structure with space and query time for
every such that: \begin{equation} c^2 \sqrt{\rho_q} +
(c^2 - 1) \sqrt{\rho_u} = \sqrt{2c^2 - 1}. \end{equation}
This is the first data structure that achieves sublinear query time and
near-linear space for every approximation factor , improving upon
[Kapralov, PODS 2015]. The data structure is a culmination of a long line of
work on the problem for all space regimes; it builds on Spherical
Locality-Sensitive Filtering [Becker, Ducas, Gama, Laarhoven, SODA 2016] and
data-dependent hashing [Andoni, Indyk, Nguyen, Razenshteyn, SODA 2014] [Andoni,
Razenshteyn, STOC 2015].
Our matching lower bounds are of two types: conditional and unconditional.
First, we prove tightness of the whole above trade-off in a restricted model of
computation, which captures all known hashing-based approaches. We then show
unconditional cell-probe lower bounds for one and two probes that match the
above trade-off for , improving upon the best known lower bounds
from [Panigrahy, Talwar, Wieder, FOCS 2010]. In particular, this is the first
space lower bound (for any static data structure) for two probes which is not
polynomially smaller than the one-probe bound. To show the result for two
probes, we establish and exploit a connection to locally-decodable codes.Comment: 62 pages, 5 figures; a merger of arXiv:1511.07527 [cs.DS] and
arXiv:1605.02701 [cs.DS], which subsumes both of the preprints. New version
contains more elaborated proofs and fixed some typo
Tight Lower Bounds for Data-Dependent Locality-Sensitive Hashing
We prove a tight lower bound for the exponent for data-dependent
Locality-Sensitive Hashing schemes, recently used to design efficient solutions
for the -approximate nearest neighbor search. In particular, our lower bound
matches the bound of for the space,
obtained via the recent algorithm from [Andoni-Razenshteyn, STOC'15].
In recent years it emerged that data-dependent hashing is strictly superior
to the classical Locality-Sensitive Hashing, when the hash function is
data-independent. In the latter setting, the best exponent has been already
known: for the space, the tight bound is , with the upper
bound from [Indyk-Motwani, STOC'98] and the matching lower bound from
[O'Donnell-Wu-Zhou, ITCS'11].
We prove that, even if the hashing is data-dependent, it must hold that
. To prove the result, we need to formalize the
exact notion of data-dependent hashing that also captures the complexity of the
hash functions (in addition to their collision properties). Without restricting
such complexity, we would allow for obviously infeasible solutions such as the
Voronoi diagram of a dataset. To preclude such solutions, we require our hash
functions to be succinct. This condition is satisfied by all the known
algorithmic results.Comment: 16 pages, no figure
Optimal lower bounds for locality sensitive hashing (except when q is tiny)
We study lower bounds for Locality Sensitive Hashing (LSH) in the strongest
setting: point sets in {0,1}^d under the Hamming distance. Recall that here H
is said to be an (r, cr, p, q)-sensitive hash family if all pairs x, y in
{0,1}^d with dist(x,y) at most r have probability at least p of collision under
a randomly chosen h in H, whereas all pairs x, y in {0,1}^d with dist(x,y) at
least cr have probability at most q of collision. Typically, one considers d
tending to infinity, with c fixed and q bounded away from 0.
For its applications to approximate nearest neighbor search in high
dimensions, the quality of an LSH family H is governed by how small its "rho
parameter" rho = ln(1/p)/ln(1/q) is as a function of the parameter c. The
seminal paper of Indyk and Motwani showed that for each c, the extremely simple
family H = {x -> x_i : i in d} achieves rho at most 1/c. The only known lower
bound, due to Motwani, Naor, and Panigrahy, is that rho must be at least .46/c
(minus o_d(1)).
In this paper we show an optimal lower bound: rho must be at least 1/c (minus
o_d(1)). This lower bound for Hamming space yields a lower bound of 1/c^2 for
Euclidean space (or the unit sphere) and 1/c for the Jaccard distance on sets;
both of these match known upper bounds. Our proof is simple; the essence is
that the noise stability of a boolean function at e^{-t} is a log-convex
function of t.Comment: 9 pages + abstract and reference
Tradeoffs for nearest neighbors on the sphere
We consider tradeoffs between the query and update complexities for the
(approximate) nearest neighbor problem on the sphere, extending the recent
spherical filters to sparse regimes and generalizing the scheme and analysis to
account for different tradeoffs. In a nutshell, for the sparse regime the
tradeoff between the query complexity and update complexity
for data sets of size is given by the following equation in
terms of the approximation factor and the exponents and :
For small , minimizing the time for updates leads to a linear
space complexity at the cost of a query time complexity .
Balancing the query and update costs leads to optimal complexities
, matching bounds from [Andoni-Razenshteyn, 2015] and [Dubiner,
IEEE-TIT'10] and matching the asymptotic complexities of [Andoni-Razenshteyn,
STOC'15] and [Andoni-Indyk-Laarhoven-Razenshteyn-Schmidt, NIPS'15]. A
subpolynomial query time complexity can be achieved at the cost of a
space complexity of the order , matching the bound
of [Andoni-Indyk-Patrascu, FOCS'06] and
[Panigrahy-Talwar-Wieder, FOCS'10] and improving upon results of
[Indyk-Motwani, STOC'98] and [Kushilevitz-Ostrovsky-Rabani, STOC'98].
For large , minimizing the update complexity results in a query complexity
of , improving upon the related exponent for large of
[Kapralov, PODS'15] by a factor , and matching the bound
of [Panigrahy-Talwar-Wieder, FOCS'08]. Balancing the costs leads to optimal
complexities , while a minimum query time complexity can be
achieved with update complexity , improving upon the
previous best exponents of Kapralov by a factor .Comment: 16 pages, 1 table, 2 figures. Mostly subsumed by arXiv:1608.03580
[cs.DS] (along with arXiv:1605.02701 [cs.DS]
Practical and Optimal LSH for Angular Distance
We show the existence of a Locality-Sensitive Hashing (LSH) family for the
angular distance that yields an approximate Near Neighbor Search algorithm with
the asymptotically optimal running time exponent. Unlike earlier algorithms
with this property (e.g., Spherical LSH [Andoni, Indyk, Nguyen, Razenshteyn
2014], [Andoni, Razenshteyn 2015]), our algorithm is also practical, improving
upon the well-studied hyperplane LSH [Charikar, 2002] in practice. We also
introduce a multiprobe version of this algorithm, and conduct experimental
evaluation on real and synthetic data sets.
We complement the above positive results with a fine-grained lower bound for
the quality of any LSH family for angular distance. Our lower bound implies
that the above LSH family exhibits a trade-off between evaluation time and
quality that is close to optimal for a natural class of LSH functions.Comment: 22 pages, an extended abstract is to appear in the proceedings of the
29th Annual Conference on Neural Information Processing Systems (NIPS 2015
- …