976 research outputs found
Hashing for Similarity Search: A Survey
Similarity search (nearest neighbor search) is a problem of pursuing the data
items whose distances to a query item are the smallest from a large database.
Various methods have been developed to address this problem, and recently a lot
of efforts have been devoted to approximate search. In this paper, we present a
survey on one of the main solutions, hashing, which has been widely studied
since the pioneering work locality sensitive hashing. We divide the hashing
algorithms two main categories: locality sensitive hashing, which designs hash
functions without exploring the data distribution and learning to hash, which
learns hash functions according the data distribution, and review them from
various aspects, including hash function design and distance measure and search
scheme in the hash coding space
Hashing with binary autoencoders
An attractive approach for fast search in image databases is binary hashing,
where each high-dimensional, real-valued image is mapped onto a
low-dimensional, binary vector and the search is done in this binary space.
Finding the optimal hash function is difficult because it involves binary
constraints, and most approaches approximate the optimization by relaxing the
constraints and then binarizing the result. Here, we focus on the binary
autoencoder model, which seeks to reconstruct an image from the binary code
produced by the hash function. We show that the optimization can be simplified
with the method of auxiliary coordinates. This reformulates the optimization as
alternating two easier steps: one that learns the encoder and decoder
separately, and one that optimizes the code for each image. Image retrieval
experiments, using precision/recall and a measure of code utilization, show the
resulting hash function outperforms or is competitive with state-of-the-art
methods for binary hashing.Comment: 22 pages, 11 figure
- …