479 research outputs found
Hashing for Similarity Search: A Survey
Similarity search (nearest neighbor search) is a problem of pursuing the data
items whose distances to a query item are the smallest from a large database.
Various methods have been developed to address this problem, and recently a lot
of efforts have been devoted to approximate search. In this paper, we present a
survey on one of the main solutions, hashing, which has been widely studied
since the pioneering work locality sensitive hashing. We divide the hashing
algorithms two main categories: locality sensitive hashing, which designs hash
functions without exploring the data distribution and learning to hash, which
learns hash functions according the data distribution, and review them from
various aspects, including hash function design and distance measure and search
scheme in the hash coding space
Efficient end-to-end learning for quantizable representations
Embedding representation learning via neural networks is at the core
foundation of modern similarity based search. While much effort has been put in
developing algorithms for learning binary hamming code representations for
search efficiency, this still requires a linear scan of the entire dataset per
each query and trades off the search accuracy through binarization. To this
end, we consider the problem of directly learning a quantizable embedding
representation and the sparse binary hash code end-to-end which can be used to
construct an efficient hash table not only providing significant search
reduction in the number of data but also achieving the state of the art search
accuracy outperforming previous state of the art deep metric learning methods.
We also show that finding the optimal sparse binary hash code in a mini-batch
can be computed exactly in polynomial time by solving a minimum cost flow
problem. Our results on Cifar-100 and on ImageNet datasets show the state of
the art search accuracy in precision@k and NMI metrics while providing up to
98X and 478X search speedup respectively over exhaustive linear search. The
source code is available at
https://github.com/maestrojeong/Deep-Hash-Table-ICML18Comment: Accepted and to appear at ICML 2018. Camera ready versio
- β¦