2,316 research outputs found

    Tight Lower Bounds for Data-Dependent Locality-Sensitive Hashing

    Get PDF
    We prove a tight lower bound for the exponent ρ\rho for data-dependent Locality-Sensitive Hashing schemes, recently used to design efficient solutions for the cc-approximate nearest neighbor search. In particular, our lower bound matches the bound of ρ12c1+o(1)\rho\le \frac{1}{2c-1}+o(1) for the 1\ell_1 space, obtained via the recent algorithm from [Andoni-Razenshteyn, STOC'15]. In recent years it emerged that data-dependent hashing is strictly superior to the classical Locality-Sensitive Hashing, when the hash function is data-independent. In the latter setting, the best exponent has been already known: for the 1\ell_1 space, the tight bound is ρ=1/c\rho=1/c, with the upper bound from [Indyk-Motwani, STOC'98] and the matching lower bound from [O'Donnell-Wu-Zhou, ITCS'11]. We prove that, even if the hashing is data-dependent, it must hold that ρ12c1o(1)\rho\ge \frac{1}{2c-1}-o(1). To prove the result, we need to formalize the exact notion of data-dependent hashing that also captures the complexity of the hash functions (in addition to their collision properties). Without restricting such complexity, we would allow for obviously infeasible solutions such as the Voronoi diagram of a dataset. To preclude such solutions, we require our hash functions to be succinct. This condition is satisfied by all the known algorithmic results.Comment: 16 pages, no figure

    Tradeoffs for nearest neighbors on the sphere

    Get PDF
    We consider tradeoffs between the query and update complexities for the (approximate) nearest neighbor problem on the sphere, extending the recent spherical filters to sparse regimes and generalizing the scheme and analysis to account for different tradeoffs. In a nutshell, for the sparse regime the tradeoff between the query complexity nρqn^{\rho_q} and update complexity nρun^{\rho_u} for data sets of size nn is given by the following equation in terms of the approximation factor cc and the exponents ρq\rho_q and ρu\rho_u: c2ρq+(c21)ρu=2c21.c^2\sqrt{\rho_q}+(c^2-1)\sqrt{\rho_u}=\sqrt{2c^2-1}. For small c=1+ϵc=1+\epsilon, minimizing the time for updates leads to a linear space complexity at the cost of a query time complexity n14ϵ2n^{1-4\epsilon^2}. Balancing the query and update costs leads to optimal complexities n1/(2c21)n^{1/(2c^2-1)}, matching bounds from [Andoni-Razenshteyn, 2015] and [Dubiner, IEEE-TIT'10] and matching the asymptotic complexities of [Andoni-Razenshteyn, STOC'15] and [Andoni-Indyk-Laarhoven-Razenshteyn-Schmidt, NIPS'15]. A subpolynomial query time complexity no(1)n^{o(1)} can be achieved at the cost of a space complexity of the order n1/(4ϵ2)n^{1/(4\epsilon^2)}, matching the bound nΩ(1/ϵ2)n^{\Omega(1/\epsilon^2)} of [Andoni-Indyk-Patrascu, FOCS'06] and [Panigrahy-Talwar-Wieder, FOCS'10] and improving upon results of [Indyk-Motwani, STOC'98] and [Kushilevitz-Ostrovsky-Rabani, STOC'98]. For large cc, minimizing the update complexity results in a query complexity of n2/c2+O(1/c4)n^{2/c^2+O(1/c^4)}, improving upon the related exponent for large cc of [Kapralov, PODS'15] by a factor 22, and matching the bound nΩ(1/c2)n^{\Omega(1/c^2)} of [Panigrahy-Talwar-Wieder, FOCS'08]. Balancing the costs leads to optimal complexities n1/(2c21)n^{1/(2c^2-1)}, while a minimum query time complexity can be achieved with update complexity n2/c2+O(1/c4)n^{2/c^2+O(1/c^4)}, improving upon the previous best exponents of Kapralov by a factor 22.Comment: 16 pages, 1 table, 2 figures. Mostly subsumed by arXiv:1608.03580 [cs.DS] (along with arXiv:1605.02701 [cs.DS]

    Evaluation of Hashing Methods Performance on Binary Feature Descriptors

    Full text link
    In this paper we evaluate performance of data-dependent hashing methods on binary data. The goal is to find a hashing method that can effectively produce lower dimensional binary representation of 512-bit FREAK descriptors. A representative sample of recent unsupervised, semi-supervised and supervised hashing methods was experimentally evaluated on large datasets of labelled binary FREAK feature descriptors

    Optimal Data-Dependent Hashing for Approximate Near Neighbors

    Full text link
    We show an optimal data-dependent hashing scheme for the approximate near neighbor problem. For an nn-point data set in a dd-dimensional space our data structure achieves query time O(dnρ+o(1))O(d n^{\rho+o(1)}) and space O(n1+ρ+o(1)+dn)O(n^{1+\rho+o(1)} + dn), where ρ=12c21\rho=\tfrac{1}{2c^2-1} for the Euclidean space and approximation c>1c>1. For the Hamming space, we obtain an exponent of ρ=12c1\rho=\tfrac{1}{2c-1}. Our result completes the direction set forth in [AINR14] who gave a proof-of-concept that data-dependent hashing can outperform classical Locality Sensitive Hashing (LSH). In contrast to [AINR14], the new bound is not only optimal, but in fact improves over the best (optimal) LSH data structures [IM98,AI06] for all approximation factors c>1c>1. From the technical perspective, we proceed by decomposing an arbitrary dataset into several subsets that are, in a certain sense, pseudo-random.Comment: 36 pages, 5 figures, an extended abstract appeared in the proceedings of the 47th ACM Symposium on Theory of Computing (STOC 2015

    Practical and Optimal LSH for Angular Distance

    Get PDF
    We show the existence of a Locality-Sensitive Hashing (LSH) family for the angular distance that yields an approximate Near Neighbor Search algorithm with the asymptotically optimal running time exponent. Unlike earlier algorithms with this property (e.g., Spherical LSH [Andoni, Indyk, Nguyen, Razenshteyn 2014], [Andoni, Razenshteyn 2015]), our algorithm is also practical, improving upon the well-studied hyperplane LSH [Charikar, 2002] in practice. We also introduce a multiprobe version of this algorithm, and conduct experimental evaluation on real and synthetic data sets. We complement the above positive results with a fine-grained lower bound for the quality of any LSH family for angular distance. Our lower bound implies that the above LSH family exhibits a trade-off between evaluation time and quality that is close to optimal for a natural class of LSH functions.Comment: 22 pages, an extended abstract is to appear in the proceedings of the 29th Annual Conference on Neural Information Processing Systems (NIPS 2015
    corecore