113 research outputs found

    Hypercube LSH for Approximate near Neighbors

    Get PDF
    A celebrated technique for finding near neighbors for the angular distance involves using a set of random hyperplanes to partition the space into hash regions [Charikar, STOC 2002]. Experiments later showed that using a set of orthogonal hyperplanes, thereby partitioning the space into the Voronoi regions induced by a hypercube, leads to even better results [Terasawa and Tanaka, WADS 2007]. However, no theoretical explanation for this improvement was ever given, and it remained unclear how the resulting hypercube hash method scales in high dimensions. In this work, we provide explicit asymptotics for the collision probabilities when using hypercubes to partition the space. For instance, two near-orthogonal vectors are expected to collide with probability (1/pi)^d in dimension d, compared to (1/2)^d when using random hyperplanes. Vectors at angle pi/3 collide with probability (sqrt[3]/pi)^d, compared to (2/3)^d for random hyperplanes, and near-parallel vectors collide with similar asymptotic probabilities in both cases. For c-approximate nearest neighbor searching, this translates to a decrease in the exponent rho of locality-sensitive hashing (LSH) methods of a factor up to log2(pi) ~ 1.652 compared to hyperplane LSH. For c = 2, we obtain rho ~ 0.302 for hypercube LSH, improving upon the rho ~ 0.377 for hyperplane LSH. We further describe how to use hypercube LSH in practice, and we consider an example application in the area of lattice algorithms

    Faster tuple lattice sieving using spherical locality-sensitive filters

    Get PDF
    To overcome the large memory requirement of classical lattice sieving algorithms for solving hard lattice problems, Bai-Laarhoven-Stehl\'{e} [ANTS 2016] studied tuple lattice sieving, where tuples instead of pairs of lattice vectors are combined to form shorter vectors. Herold-Kirshanova [PKC 2017] recently improved upon their results for arbitrary tuple sizes, for example showing that a triple sieve can solve the shortest vector problem (SVP) in dimension dd in time 20.3717d+o(d)2^{0.3717d + o(d)}, using a technique similar to locality-sensitive hashing for finding nearest neighbors. In this work, we generalize the spherical locality-sensitive filters of Becker-Ducas-Gama-Laarhoven [SODA 2016] to obtain space-time tradeoffs for near neighbor searching on dense data sets, and we apply these techniques to tuple lattice sieving to obtain even better time complexities. For instance, our triple sieve heuristically solves SVP in time 20.3588d+o(d)2^{0.3588d + o(d)}. For practical sieves based on Micciancio-Voulgaris' GaussSieve [SODA 2010], this shows that a triple sieve uses less space and less time than the current best near-linear space double sieve.Comment: 12 pages + references, 2 figures. Subsumed/merged into Cryptology ePrint Archive 2017/228, available at https://ia.cr/2017/122

    A directed isoperimetric inequality with application to Bregman near neighbor lower bounds

    Full text link
    Bregman divergences DϕD_\phi are a class of divergences parametrized by a convex function ϕ\phi and include well known distance functions like 22\ell_2^2 and the Kullback-Leibler divergence. There has been extensive research on algorithms for problems like clustering and near neighbor search with respect to Bregman divergences, in all cases, the algorithms depend not just on the data size nn and dimensionality dd, but also on a structure constant μ1\mu \ge 1 that depends solely on ϕ\phi and can grow without bound independently. In this paper, we provide the first evidence that this dependence on μ\mu might be intrinsic. We focus on the problem of approximate near neighbor search for Bregman divergences. We show that under the cell probe model, any non-adaptive data structure (like locality-sensitive hashing) for cc-approximate near-neighbor search that admits rr probes must use space Ω(n1+μcr)\Omega(n^{1 + \frac{\mu}{c r}}). In contrast, for LSH under 1\ell_1 the best bound is Ω(n1+1cr)\Omega(n^{1+\frac{1}{cr}}). Our new tool is a directed variant of the standard boolean noise operator. We show that a generalization of the Bonami-Beckner hypercontractivity inequality exists "in expectation" or upon restriction to certain subsets of the Hamming cube, and that this is sufficient to prove the desired isoperimetric inequality that we use in our data structure lower bound. We also present a structural result reducing the Hamming cube to a Bregman cube. This structure allows us to obtain lower bounds for problems under Bregman divergences from their 1\ell_1 analog. In particular, we get a (weaker) lower bound for approximate near neighbor search of the form Ω(n1+1cr)\Omega(n^{1 + \frac{1}{cr}}) for an rr-query non-adaptive data structure, and new cell probe lower bounds for a number of other near neighbor questions in Bregman space.Comment: 27 page

    Tight Lower Bounds for Data-Dependent Locality-Sensitive Hashing

    Get PDF
    We prove a tight lower bound for the exponent ρ\rho for data-dependent Locality-Sensitive Hashing schemes, recently used to design efficient solutions for the cc-approximate nearest neighbor search. In particular, our lower bound matches the bound of ρ12c1+o(1)\rho\le \frac{1}{2c-1}+o(1) for the 1\ell_1 space, obtained via the recent algorithm from [Andoni-Razenshteyn, STOC'15]. In recent years it emerged that data-dependent hashing is strictly superior to the classical Locality-Sensitive Hashing, when the hash function is data-independent. In the latter setting, the best exponent has been already known: for the 1\ell_1 space, the tight bound is ρ=1/c\rho=1/c, with the upper bound from [Indyk-Motwani, STOC'98] and the matching lower bound from [O'Donnell-Wu-Zhou, ITCS'11]. We prove that, even if the hashing is data-dependent, it must hold that ρ12c1o(1)\rho\ge \frac{1}{2c-1}-o(1). To prove the result, we need to formalize the exact notion of data-dependent hashing that also captures the complexity of the hash functions (in addition to their collision properties). Without restricting such complexity, we would allow for obviously infeasible solutions such as the Voronoi diagram of a dataset. To preclude such solutions, we require our hash functions to be succinct. This condition is satisfied by all the known algorithmic results.Comment: 16 pages, no figure

    Hashing for Similarity Search: A Survey

    Full text link
    Similarity search (nearest neighbor search) is a problem of pursuing the data items whose distances to a query item are the smallest from a large database. Various methods have been developed to address this problem, and recently a lot of efforts have been devoted to approximate search. In this paper, we present a survey on one of the main solutions, hashing, which has been widely studied since the pioneering work locality sensitive hashing. We divide the hashing algorithms two main categories: locality sensitive hashing, which designs hash functions without exploring the data distribution and learning to hash, which learns hash functions according the data distribution, and review them from various aspects, including hash function design and distance measure and search scheme in the hash coding space
    corecore