5,029 research outputs found
Hashing for Similarity Search: A Survey
Similarity search (nearest neighbor search) is a problem of pursuing the data
items whose distances to a query item are the smallest from a large database.
Various methods have been developed to address this problem, and recently a lot
of efforts have been devoted to approximate search. In this paper, we present a
survey on one of the main solutions, hashing, which has been widely studied
since the pioneering work locality sensitive hashing. We divide the hashing
algorithms two main categories: locality sensitive hashing, which designs hash
functions without exploring the data distribution and learning to hash, which
learns hash functions according the data distribution, and review them from
various aspects, including hash function design and distance measure and search
scheme in the hash coding space
An Efficient Approximate kNN Graph Method for Diffusion on Image Retrieval
The application of the diffusion in many computer vision and artificial
intelligence projects has been shown to give excellent improvements in
performance. One of the main bottlenecks of this technique is the quadratic
growth of the kNN graph size due to the high-quantity of new connections
between nodes in the graph, resulting in long computation times. Several
strategies have been proposed to address this, but none are effective and
efficient. Our novel technique, based on LSH projections, obtains the same
performance as the exact kNN graph after diffusion, but in less time
(approximately 18 times faster on a dataset of a hundred thousand images). The
proposed method was validated and compared with other state-of-the-art on
several public image datasets, including Oxford5k, Paris6k, and Oxford105k
Fast Supervised Hashing with Decision Trees for High-Dimensional Data
Supervised hashing aims to map the original features to compact binary codes
that are able to preserve label based similarity in the Hamming space.
Non-linear hash functions have demonstrated the advantage over linear ones due
to their powerful generalization capability. In the literature, kernel
functions are typically used to achieve non-linearity in hashing, which achieve
encouraging retrieval performance at the price of slow evaluation and training
time. Here we propose to use boosted decision trees for achieving non-linearity
in hashing, which are fast to train and evaluate, hence more suitable for
hashing with high dimensional data. In our approach, we first propose
sub-modular formulations for the hashing binary code inference problem and an
efficient GraphCut based block search method for solving large-scale inference.
Then we learn hash functions by training boosted decision trees to fit the
binary codes. Experiments demonstrate that our proposed method significantly
outperforms most state-of-the-art methods in retrieval precision and training
time. Especially for high-dimensional data, our method is orders of magnitude
faster than many methods in terms of training time.Comment: Appearing in Proc. IEEE Conf. Computer Vision and Pattern
Recognition, 2014, Ohio, US
- …