634 research outputs found

    Practical and Optimal LSH for Angular Distance

    Get PDF
    We show the existence of a Locality-Sensitive Hashing (LSH) family for the angular distance that yields an approximate Near Neighbor Search algorithm with the asymptotically optimal running time exponent. Unlike earlier algorithms with this property (e.g., Spherical LSH [Andoni, Indyk, Nguyen, Razenshteyn 2014], [Andoni, Razenshteyn 2015]), our algorithm is also practical, improving upon the well-studied hyperplane LSH [Charikar, 2002] in practice. We also introduce a multiprobe version of this algorithm, and conduct experimental evaluation on real and synthetic data sets. We complement the above positive results with a fine-grained lower bound for the quality of any LSH family for angular distance. Our lower bound implies that the above LSH family exhibits a trade-off between evaluation time and quality that is close to optimal for a natural class of LSH functions.Comment: 22 pages, an extended abstract is to appear in the proceedings of the 29th Annual Conference on Neural Information Processing Systems (NIPS 2015

    Tradeoffs for nearest neighbors on the sphere

    Get PDF
    We consider tradeoffs between the query and update complexities for the (approximate) nearest neighbor problem on the sphere, extending the recent spherical filters to sparse regimes and generalizing the scheme and analysis to account for different tradeoffs. In a nutshell, for the sparse regime the tradeoff between the query complexity nρqn^{\rho_q} and update complexity nρun^{\rho_u} for data sets of size nn is given by the following equation in terms of the approximation factor cc and the exponents ρq\rho_q and ρu\rho_u: c2ρq+(c21)ρu=2c21.c^2\sqrt{\rho_q}+(c^2-1)\sqrt{\rho_u}=\sqrt{2c^2-1}. For small c=1+ϵc=1+\epsilon, minimizing the time for updates leads to a linear space complexity at the cost of a query time complexity n14ϵ2n^{1-4\epsilon^2}. Balancing the query and update costs leads to optimal complexities n1/(2c21)n^{1/(2c^2-1)}, matching bounds from [Andoni-Razenshteyn, 2015] and [Dubiner, IEEE-TIT'10] and matching the asymptotic complexities of [Andoni-Razenshteyn, STOC'15] and [Andoni-Indyk-Laarhoven-Razenshteyn-Schmidt, NIPS'15]. A subpolynomial query time complexity no(1)n^{o(1)} can be achieved at the cost of a space complexity of the order n1/(4ϵ2)n^{1/(4\epsilon^2)}, matching the bound nΩ(1/ϵ2)n^{\Omega(1/\epsilon^2)} of [Andoni-Indyk-Patrascu, FOCS'06] and [Panigrahy-Talwar-Wieder, FOCS'10] and improving upon results of [Indyk-Motwani, STOC'98] and [Kushilevitz-Ostrovsky-Rabani, STOC'98]. For large cc, minimizing the update complexity results in a query complexity of n2/c2+O(1/c4)n^{2/c^2+O(1/c^4)}, improving upon the related exponent for large cc of [Kapralov, PODS'15] by a factor 22, and matching the bound nΩ(1/c2)n^{\Omega(1/c^2)} of [Panigrahy-Talwar-Wieder, FOCS'08]. Balancing the costs leads to optimal complexities n1/(2c21)n^{1/(2c^2-1)}, while a minimum query time complexity can be achieved with update complexity n2/c2+O(1/c4)n^{2/c^2+O(1/c^4)}, improving upon the previous best exponents of Kapralov by a factor 22.Comment: 16 pages, 1 table, 2 figures. Mostly subsumed by arXiv:1608.03580 [cs.DS] (along with arXiv:1605.02701 [cs.DS]

    Fast Cross-Polytope Locality-Sensitive Hashing

    Get PDF
    We provide a variant of cross-polytope locality sensitive hashing with respect to angular distance which is provably optimal in asymptotic sensitivity and enjoys O(dlnd)\mathcal{O}(d \ln d ) hash computation time. Building on a recent result (by Andoni, Indyk, Laarhoven, Razenshteyn, Schmidt, 2015), we show that optimal asymptotic sensitivity for cross-polytope LSH is retained even when the dense Gaussian matrix is replaced by a fast Johnson-Lindenstrauss transform followed by discrete pseudo-rotation, reducing the hash computation time from O(d2)\mathcal{O}(d^2) to O(dlnd)\mathcal{O}(d \ln d ). Moreover, our scheme achieves the optimal rate of convergence for sensitivity. By incorporating a low-randomness Johnson-Lindenstrauss transform, our scheme can be modified to require only O(ln9(d))\mathcal{O}(\ln^9(d)) random bitsComment: 14 pages, 6 figure
    corecore