1,833 research outputs found
Locality-Sensitive Hashing of Curves
We study data structures for storing a set of polygonal curves in
such that, given a query curve, we can efficiently retrieve similar curves from
the set, where similarity is measured using the discrete Fr\'echet distance or
the dynamic time warping distance. To this end we devise the first
locality-sensitive hashing schemes for these distance measures. A major
challenge is posed by the fact that these distance measures internally optimize
the alignment between the curves. We give solutions for different types of
alignments including constrained and unconstrained versions. For unconstrained
alignments, we improve over a result by Indyk from 2002 for short curves. Let
be the number of input curves and let be the maximum complexity of a
curve in the input. In the particular case where , for some fixed , our solutions imply an approximate near-neighbor
data structure for the discrete Fr\'echet distance that uses space in
and achieves query time in and
constant approximation factor. Furthermore, our solutions provide a trade-off
between approximation quality and computational performance: for any parameter
, we can give a data structure that uses space in , answers queries in time and achieves
approximation factor in .Comment: Proc. of 33rd International Symposium on Computational Geometry
(SoCG), 201
Locality-Sensitive Hashing with Margin Based Feature Selection
We propose a learning method with feature selection for Locality-Sensitive
Hashing. Locality-Sensitive Hashing converts feature vectors into bit arrays.
These bit arrays can be used to perform similarity searches and personal
authentication. The proposed method uses bit arrays longer than those used in
the end for similarity and other searches and by learning selects the bits that
will be used. We demonstrated this method can effectively perform optimization
for cases such as fingerprint images with a large number of labels and
extremely few data that share the same labels, as well as verifying that it is
also effective for natural images, handwritten digits, and speech features.Comment: 9 pages, 6 figures, 3 table
Improved Asymmetric Locality Sensitive Hashing (ALSH) for Maximum Inner Product Search (MIPS)
Recently it was shown that the problem of Maximum Inner Product Search (MIPS)
is efficient and it admits provably sub-linear hashing algorithms. Asymmetric
transformations before hashing were the key in solving MIPS which was otherwise
hard. In the prior work, the authors use asymmetric transformations which
convert the problem of approximate MIPS into the problem of approximate near
neighbor search which can be efficiently solved using hashing. In this work, we
provide a different transformation which converts the problem of approximate
MIPS into the problem of approximate cosine similarity search which can be
efficiently solved using signed random projections. Theoretical analysis show
that the new scheme is significantly better than the original scheme for MIPS.
Experimental evaluations strongly support the theoretical findings.Comment: arXiv admin note: text overlap with arXiv:1405.586
Hyperplane Arrangements and Locality-Sensitive Hashing with Lift
Locality-sensitive hashing converts high-dimensional feature vectors, such as
image and speech, into bit arrays and allows high-speed similarity calculation
with the Hamming distance. There is a hashing scheme that maps feature vectors
to bit arrays depending on the signs of the inner products between feature
vectors and the normal vectors of hyperplanes placed in the feature space. This
hashing can be seen as a discretization of the feature space by hyperplanes. If
labels for data are given, one can determine the hyperplanes by using learning
algorithms. However, many proposed learning methods do not consider the
hyperplanes' offsets. Not doing so decreases the number of partitioned regions,
and the correlation between Hamming distances and Euclidean distances becomes
small. In this paper, we propose a lift map that converts learning algorithms
without the offsets to the ones that take into account the offsets. With this
method, the learning methods without the offsets give the discretizations of
spaces as if it takes into account the offsets. For the proposed method, we
input several high-dimensional feature data sets and studied the relationship
between the statistical characteristics of data, the number of hyperplanes, and
the effect of the proposed method.Comment: 9 pages, 7 figure
Bilinear Random Projections for Locality-Sensitive Binary Codes
Locality-sensitive hashing (LSH) is a popular data-independent indexing
method for approximate similarity search, where random projections followed by
quantization hash the points from the database so as to ensure that the
probability of collision is much higher for objects that are close to each
other than for those that are far apart. Most of high-dimensional visual
descriptors for images exhibit a natural matrix structure. When visual
descriptors are represented by high-dimensional feature vectors and long binary
codes are assigned, a random projection matrix requires expensive complexities
in both space and time. In this paper we analyze a bilinear random projection
method where feature matrices are transformed to binary codes by two smaller
random projection matrices. We base our theoretical analysis on extending
Raginsky and Lazebnik's result where random Fourier features are composed with
random binary quantizers to form locality sensitive binary codes. To this end,
we answer the following two questions: (1) whether a bilinear random projection
also yields similarity-preserving binary codes; (2) whether a bilinear random
projection yields performance gain or loss, compared to a large linear
projection. Regarding the first question, we present upper and lower bounds on
the expected Hamming distance between binary codes produced by bilinear random
projections. In regards to the second question, we analyze the upper and lower
bounds on covariance between two bits of binary codes, showing that the
correlation between two bits is small. Numerical experiments on MNIST and
Flickr45K datasets confirm the validity of our method.Comment: 11 pages, 23 figures, CVPR-201
- …