8 research outputs found
Fast Locality-Sensitive Hashing Frameworks for Approximate Near Neighbor Search
The Indyk-Motwani Locality-Sensitive Hashing (LSH) framework (STOC 1998) is a
general technique for constructing a data structure to answer approximate near
neighbor queries by using a distribution over locality-sensitive
hash functions that partition space. For a collection of points, after
preprocessing, the query time is dominated by evaluations
of hash functions from and hash table lookups and
distance computations where is determined by the
locality-sensitivity properties of . It follows from a recent
result by Dahlgaard et al. (FOCS 2017) that the number of locality-sensitive
hash functions can be reduced to , leaving the query time to be
dominated by distance computations and
additional word-RAM operations. We state this result as a general framework and
provide a simpler analysis showing that the number of lookups and distance
computations closely match the Indyk-Motwani framework, making it a viable
replacement in practice. Using ideas from another locality-sensitive hashing
framework by Andoni and Indyk (SODA 2006) we are able to reduce the number of
additional word-RAM operations to .Comment: 15 pages, 3 figure
Practical and Optimal LSH for Angular Distance
We show the existence of a Locality-Sensitive Hashing (LSH) family for the
angular distance that yields an approximate Near Neighbor Search algorithm with
the asymptotically optimal running time exponent. Unlike earlier algorithms
with this property (e.g., Spherical LSH [Andoni, Indyk, Nguyen, Razenshteyn
2014], [Andoni, Razenshteyn 2015]), our algorithm is also practical, improving
upon the well-studied hyperplane LSH [Charikar, 2002] in practice. We also
introduce a multiprobe version of this algorithm, and conduct experimental
evaluation on real and synthetic data sets.
We complement the above positive results with a fine-grained lower bound for
the quality of any LSH family for angular distance. Our lower bound implies
that the above LSH family exhibits a trade-off between evaluation time and
quality that is close to optimal for a natural class of LSH functions.Comment: 22 pages, an extended abstract is to appear in the proceedings of the
29th Annual Conference on Neural Information Processing Systems (NIPS 2015
Business Analytics and Mathematical Sciences
Hashing techniques have been intensively investigated in the design of highly efficient search engines for largescale computer vision applications. Compared with prior approximate nearest neighbor search approaches like treebased indexing, hashing-based search schemes have prominent advantages in terms of both storage and computational efficiencies. Moreover, the procedure of devising hash functions can be easily incorporated into sophisticated machine learning tools, leading to data-dependent and task-specific compact hash codes. Therefore, a number of learning paradigms, ranging from unsupervised to supervised, have been applied to compose appropriate hash functions. However, most of the existing hash function learning methods either treat hash function design as a classification problem or generate binary codes to satisfy pairwise supervision, and have not yet directly optimized the search accuracy. In this paper, we propose to leverage listwise supervision into a principled hash function learning framework. In particular, the ranking information is represented by a set of rank triplets that can be used to assess the quality of ranking. Simple linear projection-based hash functions are solved efficiently through maximizing the ranking quality over the training data. We carry out experiments on large image datasets with size up to one million and compare with the state-of-the-art hashing techniques. The extensive results corroborate that our learned hash codes via listwise supervision can provide superior search accuracy without incurring heavy computational overhead. 1
A reliable order-statistics-based approximate nearest neighbor search algorithm
We propose a new algorithm for fast approximate nearest neighbor search based
on the properties of ordered vectors. Data vectors are classified based on the
index and sign of their largest components, thereby partitioning the space in a
number of cones centered in the origin. The query is itself classified, and the
search starts from the selected cone and proceeds to neighboring ones. Overall,
the proposed algorithm corresponds to locality sensitive hashing in the space
of directions, with hashing based on the order of components. Thanks to the
statistical features emerging through ordering, it deals very well with the
challenging case of unstructured data, and is a valuable building block for
more complex techniques dealing with structured data. Experiments on both
simulated and real-world data prove the proposed algorithm to provide a
state-of-the-art performance
Hashing for Similarity Search: A Survey
Similarity search (nearest neighbor search) is a problem of pursuing the data
items whose distances to a query item are the smallest from a large database.
Various methods have been developed to address this problem, and recently a lot
of efforts have been devoted to approximate search. In this paper, we present a
survey on one of the main solutions, hashing, which has been widely studied
since the pioneering work locality sensitive hashing. We divide the hashing
algorithms two main categories: locality sensitive hashing, which designs hash
functions without exploring the data distribution and learning to hash, which
learns hash functions according the data distribution, and review them from
various aspects, including hash function design and distance measure and search
scheme in the hash coding space