57 research outputs found
Optimal Data-Dependent Hashing for Approximate Near Neighbors
We show an optimal data-dependent hashing scheme for the approximate near
neighbor problem. For an -point data set in a -dimensional space our data
structure achieves query time and space , where for the Euclidean space and
approximation . For the Hamming space, we obtain an exponent of
.
Our result completes the direction set forth in [AINR14] who gave a
proof-of-concept that data-dependent hashing can outperform classical Locality
Sensitive Hashing (LSH). In contrast to [AINR14], the new bound is not only
optimal, but in fact improves over the best (optimal) LSH data structures
[IM98,AI06] for all approximation factors .
From the technical perspective, we proceed by decomposing an arbitrary
dataset into several subsets that are, in a certain sense, pseudo-random.Comment: 36 pages, 5 figures, an extended abstract appeared in the proceedings
of the 47th ACM Symposium on Theory of Computing (STOC 2015
Efficient learning of neighbor representations for boundary trees and forests
We introduce a semiparametric approach to neighbor-based classification. We
build off the recently proposed Boundary Trees algorithm by Mathy et al.(2015)
which enables fast neighbor-based classification, regression and retrieval in
large datasets. While boundary trees use an Euclidean measure of similarity,
the Differentiable Boundary Tree algorithm by Zoran et al.(2017) was introduced
to learn low-dimensional representations of complex input data, on which
semantic similarity can be calculated to train boundary trees. As is pointed
out by its authors, the differentiable boundary tree approach contains a few
limitations that prevents it from scaling to large datasets. In this paper, we
introduce Differentiable Boundary Sets, an algorithm that overcomes the
computational issues of the differentiable boundary tree scheme and also
improves its classification accuracy and data representability. Our algorithm
is efficiently implementable with existing tools and offers a significant
reduction in training time. We test and compare the algorithms on the well
known MNIST handwritten digits dataset and the newer Fashion-MNIST dataset by
Xiao et al.(2017).Comment: 9 pages, 2 figure
When Hashing Met Matching: Efficient Spatio-Temporal Search for Ridesharing
Carpooling, or sharing a ride with other passengers, holds immense potential
for urban transportation. Ridesharing platforms enable such sharing of rides
using real-time data. Finding ride matches in real-time at urban scale is a
difficult combinatorial optimization task and mostly heuristic approaches are
applied. In this work, we mathematically model the problem as that of finding
near-neighbors and devise a novel efficient spatio-temporal search algorithm
based on the theory of locality sensitive hashing for Maximum Inner Product
Search (MIPS). The proposed algorithm can find near-optimal potential
matches for every ride from a pool of rides in time and space for a small . Our
algorithm can be extended in several useful and interesting ways increasing its
practical appeal. Experiments with large NY yellow taxi trip datasets show that
our algorithm consistently outperforms state-of-the-art heuristic methods
thereby proving its practical applicability
- …