3,420 research outputs found

    Optimized Projection for Hashing

    Get PDF
    Hashing, which seeks for binary codes to represent data, has drawn increasing research interest in recent years. Most existing Hashing methods follow a projection-quantization framework which first projects high-dimensional data into compact low-dimensional space and then quantifies the compact data into binary codes. The projection step plays a key role in Hashing and academia has paid considerable attention to it. Previous works have proven that a good projection should simultaneously 1) preserve important information in original data, and 2) lead to compact representation with low quantization error. However, they adopted a greedy two-step strategy to consider the above two properties separately. In this paper, we empirically show that such a two-step strategy will result in a sub-optimal solution because the optimal solution to 1) limits the feasible set for the solution to 2). We put forward a novel projection learning method for Hashing, dubbed Optimized Projection (OPH). Specifically, we propose to learn the projection in a unified formulation which can find a good trade-off such that the overall performance can be optimized. A general framework is given such that OPH can be incorporated with different Hashing methods for different situations. We also introduce an effective gradient-based optimization algorithm for OPH. We carried out extensive experiments for Hashing-based Approximate Nearest Neighbor search and Content-based Data Retrieval on six benchmark datasets. The results show that OPH significantly outperforms several state-of-the-art related Hashing methods

    Hashing for Similarity Search: A Survey

    Full text link
    Similarity search (nearest neighbor search) is a problem of pursuing the data items whose distances to a query item are the smallest from a large database. Various methods have been developed to address this problem, and recently a lot of efforts have been devoted to approximate search. In this paper, we present a survey on one of the main solutions, hashing, which has been widely studied since the pioneering work locality sensitive hashing. We divide the hashing algorithms two main categories: locality sensitive hashing, which designs hash functions without exploring the data distribution and learning to hash, which learns hash functions according the data distribution, and review them from various aspects, including hash function design and distance measure and search scheme in the hash coding space

    Scalable Image Retrieval by Sparse Product Quantization

    Get PDF
    Fast Approximate Nearest Neighbor (ANN) search technique for high-dimensional feature indexing and retrieval is the crux of large-scale image retrieval. A recent promising technique is Product Quantization, which attempts to index high-dimensional image features by decomposing the feature space into a Cartesian product of low dimensional subspaces and quantizing each of them separately. Despite the promising results reported, their quantization approach follows the typical hard assignment of traditional quantization methods, which may result in large quantization errors and thus inferior search performance. Unlike the existing approaches, in this paper, we propose a novel approach called Sparse Product Quantization (SPQ) to encoding the high-dimensional feature vectors into sparse representation. We optimize the sparse representations of the feature vectors by minimizing their quantization errors, making the resulting representation is essentially close to the original data in practice. Experiments show that the proposed SPQ technique is not only able to compress data, but also an effective encoding technique. We obtain state-of-the-art results for ANN search on four public image datasets and the promising results of content-based image retrieval further validate the efficacy of our proposed method.Comment: 12 page

    Optimized Cartesian KK-Means

    Full text link
    Product quantization-based approaches are effective to encode high-dimensional data points for approximate nearest neighbor search. The space is decomposed into a Cartesian product of low-dimensional subspaces, each of which generates a sub codebook. Data points are encoded as compact binary codes using these sub codebooks, and the distance between two data points can be approximated efficiently from their codes by the precomputed lookup tables. Traditionally, to encode a subvector of a data point in a subspace, only one sub codeword in the corresponding sub codebook is selected, which may impose strict restrictions on the search accuracy. In this paper, we propose a novel approach, named Optimized Cartesian KK-Means (OCKM), to better encode the data points for more accurate approximate nearest neighbor search. In OCKM, multiple sub codewords are used to encode the subvector of a data point in a subspace. Each sub codeword stems from different sub codebooks in each subspace, which are optimally generated with regards to the minimization of the distortion errors. The high-dimensional data point is then encoded as the concatenation of the indices of multiple sub codewords from all the subspaces. This can provide more flexibility and lower distortion errors than traditional methods. Experimental results on the standard real-life datasets demonstrate the superiority over state-of-the-art approaches for approximate nearest neighbor search.Comment: to appear in IEEE TKDE, accepted in Apr. 201
    • …
    corecore