Search CORE

401 research outputs found

Optimal lower bounds for locality sensitive hashing (except when q is tiny)

Author: O'Donnell Ryan
Wu Yi
Zhou Yuan
Publication venue
Publication date: 01/01/2009
Field of study

We study lower bounds for Locality Sensitive Hashing (LSH) in the strongest setting: point sets in {0,1}^d under the Hamming distance. Recall that here H is said to be an (r, cr, p, q)-sensitive hash family if all pairs x, y in {0,1}^d with dist(x,y) at most r have probability at least p of collision under a randomly chosen h in H, whereas all pairs x, y in {0,1}^d with dist(x,y) at least cr have probability at most q of collision. Typically, one considers d tending to infinity, with c fixed and q bounded away from 0. For its applications to approximate nearest neighbor search in high dimensions, the quality of an LSH family H is governed by how small its "rho parameter" rho = ln(1/p)/ln(1/q) is as a function of the parameter c. The seminal paper of Indyk and Motwani showed that for each c, the extremely simple family H = {x -> x_i : i in d} achieves rho at most 1/c. The only known lower bound, due to Motwani, Naor, and Panigrahy, is that rho must be at least .46/c (minus o_d(1)). In this paper we show an optimal lower bound: rho must be at least 1/c (minus o_d(1)). This lower bound for Hamming space yields a lower bound of 1/c^2 for Euclidean space (or the unit sphere) and 1/c for the Jaccard distance on sets; both of these match known upper bounds. Our proof is simple; the essence is that the noise stability of a boolean function at e^{-t} is a log-convex function of t.Comment: 9 pages + abstract and reference

arXiv.org e-Print Archive

CiteSeerX

Tight Lower Bounds for Data-Dependent Locality-Sensitive Hashing

Author: Andoni Alexandr
Razenshteyn Ilya
Publication venue
Publication date: 01/01/2015
Field of study

We prove a tight lower bound for the exponent

\rho

for data-dependent Locality-Sensitive Hashing schemes, recently used to design efficient solutions for the

c

-approximate nearest neighbor search. In particular, our lower bound matches the bound of

\rho\le \frac{1}{2c-1}+o(1)

for the

\ell_1

space, obtained via the recent algorithm from [Andoni-Razenshteyn, STOC'15]. In recent years it emerged that data-dependent hashing is strictly superior to the classical Locality-Sensitive Hashing, when the hash function is data-independent. In the latter setting, the best exponent has been already known: for the

\ell_1

space, the tight bound is

\rho=1/c

, with the upper bound from [Indyk-Motwani, STOC'98] and the matching lower bound from [O'Donnell-Wu-Zhou, ITCS'11]. We prove that, even if the hashing is data-dependent, it must hold that

\rho\ge \frac{1}{2c-1}-o(1)

. To prove the result, we need to formalize the exact notion of data-dependent hashing that also captures the complexity of the hash functions (in addition to their collision properties). Without restricting such complexity, we would allow for obviously infeasible solutions such as the Voronoi diagram of a dataset. To preclude such solutions, we require our hash functions to be succinct. This condition is satisfied by all the known algorithmic results.Comment: 16 pages, no figure

arXiv.org e-Print Archive

CiteSeerX

Dagstuhl Research Online Publication Server

Fast Cross-Polytope Locality-Sensitive Hashing

Author: Kennedy Christopher
Ward Rachel
Publication venue
Publication date: 20/09/2016
Field of study

We provide a variant of cross-polytope locality sensitive hashing with respect to angular distance which is provably optimal in asymptotic sensitivity and enjoys

\mathcal{O}(d \ln d )

hash computation time. Building on a recent result (by Andoni, Indyk, Laarhoven, Razenshteyn, Schmidt, 2015), we show that optimal asymptotic sensitivity for cross-polytope LSH is retained even when the dense Gaussian matrix is replaced by a fast Johnson-Lindenstrauss transform followed by discrete pseudo-rotation, reducing the hash computation time from

\mathcal{O}(d^2)

\mathcal{O}(d \ln d )

. Moreover, our scheme achieves the optimal rate of convergence for sensitivity. By incorporating a low-randomness Johnson-Lindenstrauss transform, our scheme can be modified to require only

\mathcal{O}(\ln^9(d))

random bitsComment: 14 pages, 6 figure

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Practical and Optimal LSH for Angular Distance

Author: Alexandr Andoni
Ilya Razenshteyn
Ludwig Schmidt
Piotr Indyk
Thijs Laarhoven
Tu Eindhoven
Publication venue
Publication date: 01/01/2015
Field of study

We show the existence of a Locality-Sensitive Hashing (LSH) family for the angular distance that yields an approximate Near Neighbor Search algorithm with the asymptotically optimal running time exponent. Unlike earlier algorithms with this property (e.g., Spherical LSH [Andoni, Indyk, Nguyen, Razenshteyn 2014], [Andoni, Razenshteyn 2015]), our algorithm is also practical, improving upon the well-studied hyperplane LSH [Charikar, 2002] in practice. We also introduce a multiprobe version of this algorithm, and conduct experimental evaluation on real and synthetic data sets. We complement the above positive results with a fine-grained lower bound for the quality of any LSH family for angular distance. Our lower bound implies that the above LSH family exhibits a trade-off between evaluation time and quality that is close to optimal for a natural class of LSH functions.Comment: 22 pages, an extended abstract is to appear in the proceedings of the 29th Annual Conference on Neural Information Processing Systems (NIPS 2015

arXiv.org e-Print Archive

Hardness of Approximate Nearest Neighbor Search

Author: A
Abboud Amir
Ahle Thomas Dybdahl
Alman Josh
Andoni Alexandr
Andoni Alexandr
Arya Sunil
Arya Sunil
Chan Timothy M.
Difference Between Closest On
Fast
Klauck Hartmut
Lower
Oblivious
Optimal
Patrascu Mihai
Shamos Michael Ian
Publication venue
Publication date: 02/03/2018
Field of study

We prove conditional near-quadratic running time lower bounds for approximate Bichromatic Closest Pair with Euclidean, Manhattan, Hamming, or edit distance. Specifically, unless the Strong Exponential Time Hypothesis (SETH) is false, for every

\delta>0

there exists a constant

\epsilon>0

such that computing a

(1+\epsilon)

-approximation to the Bichromatic Closest Pair requires

n^{2-\delta}

time. In particular, this implies a near-linear query time for Approximate Nearest Neighbor search with polynomial preprocessing time. Our reduction uses the Distributed PCP framework of [ARW'17], but obtains improved efficiency using Algebraic Geometry (AG) codes. Efficient PCPs from AG codes have been constructed in other settings before [BKKMS'16, BCGRS'17], but our construction is the first to yield new hardness results

arXiv.org e-Print Archive

Crossref

Lower Bounds on Time-Space Trade-Offs for Approximate Near Neighbors

Author: Andoni Alexandr
Laarhoven Thijs
Razenshteyn Ilya
Waingarten Erik
Publication venue
Publication date: 01/01/2016
Field of study

We show tight lower bounds for the entire trade-off between space and query time for the Approximate Near Neighbor search problem. Our lower bounds hold in a restricted model of computation, which captures all hashing-based approaches. In articular, our lower bound matches the upper bound recently shown in [Laarhoven 2015] for the random instance on a Euclidean sphere (which we show in fact extends to the entire space

\mathbb{R}^d

using the techniques from [Andoni, Razenshteyn 2015]). We also show tight, unconditional cell-probe lower bounds for one and two probes, improving upon the best known bounds from [Panigrahy, Talwar, Wieder 2010]. In particular, this is the first space lower bound (for any static data structure) for two probes which is not polynomially smaller than for one probe. To show the result for two probes, we establish and exploit a connection to locally-decodable codes.Comment: 47 pages, 2 figures; v2: substantially revised introduction, lots of small corrections; subsumed by arXiv:1608.03580 [cs.DS] (along with arXiv:1511.07527 [cs.DS]

arXiv.org e-Print Archive

Repository TU/e

Pure OAI Repository

Optimal Hashing-based Time-Space Trade-offs for Approximate Near Neighbors

Author: Andoni Alexandr
Klein Philip N.
Laarhoven Thijs
Razenshteyn Ilya
Waingarten Erik
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 01/01/2016
Field of study

[See the paper for the full abstract.] We show tight upper and lower bounds for time-space trade-offs for the

c

-Approximate Near Neighbor Search problem. For the

d

-dimensional Euclidean space and

n

-point datasets, we develop a data structure with space

n^{1 + \rho_u + o(1)} + O(dn)

and query time

n^{\rho_q + o(1)} + d n^{o(1)}

for every

\rho_u, \rho_q \geq 0

such that: \begin{equation} c^2 \sqrt{\rho_q} + (c^2 - 1) \sqrt{\rho_u} = \sqrt{2c^2 - 1}. \end{equation} This is the first data structure that achieves sublinear query time and near-linear space for every approximation factor

c > 1

, improving upon [Kapralov, PODS 2015]. The data structure is a culmination of a long line of work on the problem for all space regimes; it builds on Spherical Locality-Sensitive Filtering [Becker, Ducas, Gama, Laarhoven, SODA 2016] and data-dependent hashing [Andoni, Indyk, Nguyen, Razenshteyn, SODA 2014] [Andoni, Razenshteyn, STOC 2015]. Our matching lower bounds are of two types: conditional and unconditional. First, we prove tightness of the whole above trade-off in a restricted model of computation, which captures all known hashing-based approaches. We then show unconditional cell-probe lower bounds for one and two probes that match the above trade-off for

\rho_q = 0

, improving upon the best known lower bounds from [Panigrahy, Talwar, Wieder, FOCS 2010]. In particular, this is the first space lower bound (for any static data structure) for two probes which is not polynomially smaller than the one-probe bound. To show the result for two probes, we establish and exploit a connection to locally-decodable codes.Comment: 62 pages, 5 figures; a merger of arXiv:1511.07527 [cs.DS] and arXiv:1605.02701 [cs.DS], which subsumes both of the preprints. New version contains more elaborated proofs and fixed some typo

arXiv.org e-Print Archive

Repository TU/e

Crossref

Pure OAI Repository

Optimal Data-Dependent Hashing for Approximate Near Neighbors

Author: Andoni A.
Johnson W. B.
N. R. Council
O'Donnell R.
Razenshteyn I.
Verma N.
Weiss Y.
Publication venue
Publication date: 15/07/2015
Field of study

We show an optimal data-dependent hashing scheme for the approximate near neighbor problem. For an

n

-point data set in a

d

-dimensional space our data structure achieves query time

O(d n^{\rho+o(1)})

and space

O(n^{1+\rho+o(1)} + dn)

, where

\rho=\tfrac{1}{2c^2-1}

for the Euclidean space and approximation

c>1

. For the Hamming space, we obtain an exponent of

\rho=\tfrac{1}{2c-1}

. Our result completes the direction set forth in [AINR14] who gave a proof-of-concept that data-dependent hashing can outperform classical Locality Sensitive Hashing (LSH). In contrast to [AINR14], the new bound is not only optimal, but in fact improves over the best (optimal) LSH data structures [IM98,AI06] for all approximation factors

c>1

. From the technical perspective, we proceed by decomposing an arbitrary dataset into several subsets that are, in a certain sense, pseudo-random.Comment: 36 pages, 5 figures, an extended abstract appeared in the proceedings of the 47th ACM Symposium on Theory of Computing (STOC 2015

arXiv.org e-Print Archive

Crossref