1,630 research outputs found
Locality-Sensitive Hashing with Margin Based Feature Selection
We propose a learning method with feature selection for Locality-Sensitive
Hashing. Locality-Sensitive Hashing converts feature vectors into bit arrays.
These bit arrays can be used to perform similarity searches and personal
authentication. The proposed method uses bit arrays longer than those used in
the end for similarity and other searches and by learning selects the bits that
will be used. We demonstrated this method can effectively perform optimization
for cases such as fingerprint images with a large number of labels and
extremely few data that share the same labels, as well as verifying that it is
also effective for natural images, handwritten digits, and speech features.Comment: 9 pages, 6 figures, 3 table
HDIdx: High-Dimensional Indexing for Efficient Approximate Nearest Neighbor Search
Fast Nearest Neighbor (NN) search is a fundamental challenge in large-scale
data processing and analytics, particularly for analyzing multimedia contents
which are often of high dimensionality. Instead of using exact NN search,
extensive research efforts have been focusing on approximate NN search
algorithms. In this work, we present "HDIdx", an efficient high-dimensional
indexing library for fast approximate NN search, which is open-source and
written in Python. It offers a family of state-of-the-art algorithms that
convert input high-dimensional vectors into compact binary codes, making them
very efficient and scalable for NN search with very low space complexity
Hashing for Similarity Search: A Survey
Similarity search (nearest neighbor search) is a problem of pursuing the data
items whose distances to a query item are the smallest from a large database.
Various methods have been developed to address this problem, and recently a lot
of efforts have been devoted to approximate search. In this paper, we present a
survey on one of the main solutions, hashing, which has been widely studied
since the pioneering work locality sensitive hashing. We divide the hashing
algorithms two main categories: locality sensitive hashing, which designs hash
functions without exploring the data distribution and learning to hash, which
learns hash functions according the data distribution, and review them from
various aspects, including hash function design and distance measure and search
scheme in the hash coding space
Fast Exact Search in Hamming Space with Multi-Index Hashing
There is growing interest in representing image data and feature descriptors
using compact binary codes for fast near neighbor search. Although binary codes
are motivated by their use as direct indices (addresses) into a hash table,
codes longer than 32 bits are not being used as such, as it was thought to be
ineffective. We introduce a rigorous way to build multiple hash tables on
binary code substrings that enables exact k-nearest neighbor search in Hamming
space. The approach is storage efficient and straightforward to implement.
Theoretical analysis shows that the algorithm exhibits sub-linear run-time
behavior for uniformly distributed codes. Empirical results show dramatic
speedups over a linear scan baseline for datasets of up to one billion codes of
64, 128, or 256 bits
Hyperplane Arrangements and Locality-Sensitive Hashing with Lift
Locality-sensitive hashing converts high-dimensional feature vectors, such as
image and speech, into bit arrays and allows high-speed similarity calculation
with the Hamming distance. There is a hashing scheme that maps feature vectors
to bit arrays depending on the signs of the inner products between feature
vectors and the normal vectors of hyperplanes placed in the feature space. This
hashing can be seen as a discretization of the feature space by hyperplanes. If
labels for data are given, one can determine the hyperplanes by using learning
algorithms. However, many proposed learning methods do not consider the
hyperplanes' offsets. Not doing so decreases the number of partitioned regions,
and the correlation between Hamming distances and Euclidean distances becomes
small. In this paper, we propose a lift map that converts learning algorithms
without the offsets to the ones that take into account the offsets. With this
method, the learning methods without the offsets give the discretizations of
spaces as if it takes into account the offsets. For the proposed method, we
input several high-dimensional feature data sets and studied the relationship
between the statistical characteristics of data, the number of hyperplanes, and
the effect of the proposed method.Comment: 9 pages, 7 figure
- …