6,957 research outputs found
Learning Binary Hash Codes for Large-Scale Image Search
Abstract Algorithms to rapidly search massive image or video collections are crit-ical for many vision applications, including visual search, content-based retrieval, and non-parametric models for object recognition. Recent work shows that learned binary projections are a powerful way to index large collections according to their content. The basic idea is to formulate the projections so as to approximately pre-serve a given similarity function of interest. Having done so, one can then search the data efficiently using hash tables, or by exploring the Hamming ball volume around a novel query. Both enable sub-linear time retrieval with respect to the database size. Further, depending on the design of the projections, in some cases it is possible to bound the number of database examples that must be searched in order to achieve a given level of accuracy. This chapter overviews data structures for fast search with binary codes, and then describes several supervised and unsupervised strategies for generating the codes. In particular, we review supervised methods that integrate metric learning, boost-ing, and neural networks into the hash key construction, and unsupervised methods based on spectral analysis or kernelized random projections that compute affinity-preserving binary codes. Whether learning from explicit semantic supervision or ex-ploiting the structure among unlabeled data, these methods make scalable retrieval possible for a variety of robust visual similarity measures. We focus on defining the algorithms, and illustrate the main points with results using millions of images
Simultaneous Feature Learning and Hash Coding with Deep Neural Networks
Similarity-preserving hashing is a widely-used method for nearest neighbour
search in large-scale image retrieval tasks. For most existing hashing methods,
an image is first encoded as a vector of hand-engineering visual features,
followed by another separate projection or quantization step that generates
binary codes. However, such visual feature vectors may not be optimally
compatible with the coding process, thus producing sub-optimal hashing codes.
In this paper, we propose a deep architecture for supervised hashing, in which
images are mapped into binary codes via carefully designed deep neural
networks. The pipeline of the proposed deep architecture consists of three
building blocks: 1) a sub-network with a stack of convolution layers to produce
the effective intermediate image features; 2) a divide-and-encode module to
divide the intermediate image features into multiple branches, each encoded
into one hash bit; and 3) a triplet ranking loss designed to characterize that
one image is more similar to the second image than to the third one. Extensive
evaluations on several benchmark image datasets show that the proposed
simultaneous feature learning and hash coding pipeline brings substantial
improvements over other state-of-the-art supervised or unsupervised hashing
methods.Comment: This paper has been accepted to IEEE International Conference on
Pattern Recognition and Computer Vision (CVPR), 201
Learning Binary Code Representations for Effective and Efficient Image Retrieval
The size of online image datasets is constantly increasing. Considering an image dataset with millions of images, image retrieval becomes a seemingly intractable problem for exhaustive similarity search algorithms. Hashing methods, which encodes high-dimensional descriptors into compact binary strings, have become very popular because of their high efficiency in search and storage capacity.
In the first part, we propose a multimodal retrieval method based on latent feature models. The procedure consists of a nonparametric Bayesian framework for learning underlying semantically meaningful abstract features in a multimodal dataset, a probabilistic retrieval model that allows cross-modal queries and an extension model for relevance feedback.
In the second part, we focus on supervised hashing with kernels. We describe a flexible hashing procedure that treats binary codes and pairwise semantic similarity as latent and observed variables, respectively, in a probabilistic model based on Gaussian processes for binary classification. We present a scalable inference algorithm with the sparse pseudo-input Gaussian process (SPGP) model and distributed computing.
In the last part, we define an incremental hashing strategy for dynamic databases where new images are added to the databases frequently. The method is based on a two-stage classification framework using binary and multi-class SVMs. The proposed method also enforces balance in binary codes by an imbalance penalty to obtain higher quality binary codes. We learn hash functions by an efficient algorithm where the NP-hard problem of finding optimal binary codes is solved via cyclic coordinate descent and SVMs are trained in a parallelized incremental manner. For modifications like adding images from an unseen class, we propose an incremental procedure for effective and efficient updates to the previous hash functions. Experiments on three large-scale image datasets demonstrate that the incremental strategy is capable of efficiently updating hash functions to the same retrieval performance as hashing from scratch
- …