714 research outputs found
Improved Asymmetric Locality Sensitive Hashing (ALSH) for Maximum Inner Product Search (MIPS)
Recently it was shown that the problem of Maximum Inner Product Search (MIPS)
is efficient and it admits provably sub-linear hashing algorithms. Asymmetric
transformations before hashing were the key in solving MIPS which was otherwise
hard. In the prior work, the authors use asymmetric transformations which
convert the problem of approximate MIPS into the problem of approximate near
neighbor search which can be efficiently solved using hashing. In this work, we
provide a different transformation which converts the problem of approximate
MIPS into the problem of approximate cosine similarity search which can be
efficiently solved using signed random projections. Theoretical analysis show
that the new scheme is significantly better than the original scheme for MIPS.
Experimental evaluations strongly support the theoretical findings.Comment: arXiv admin note: text overlap with arXiv:1405.586
On Symmetric and Asymmetric LSHs for Inner Product Search
We consider the problem of designing locality sensitive hashes (LSH) for
inner product similarity, and of the power of asymmetric hashes in this
context. Shrivastava and Li argue that there is no symmetric LSH for the
problem and propose an asymmetric LSH based on different mappings for query and
database points. However, we show there does exist a simple symmetric LSH that
enjoys stronger guarantees and better empirical performance than the asymmetric
LSH they suggest. We also show a variant of the settings where asymmetry is
in-fact needed, but there a different asymmetric LSH is required.Comment: 11 pages, 3 figures, In Proceedings of The 32nd International
Conference on Machine Learning (ICML
Tradeoffs for nearest neighbors on the sphere
We consider tradeoffs between the query and update complexities for the
(approximate) nearest neighbor problem on the sphere, extending the recent
spherical filters to sparse regimes and generalizing the scheme and analysis to
account for different tradeoffs. In a nutshell, for the sparse regime the
tradeoff between the query complexity and update complexity
for data sets of size is given by the following equation in
terms of the approximation factor and the exponents and :
For small , minimizing the time for updates leads to a linear
space complexity at the cost of a query time complexity .
Balancing the query and update costs leads to optimal complexities
, matching bounds from [Andoni-Razenshteyn, 2015] and [Dubiner,
IEEE-TIT'10] and matching the asymptotic complexities of [Andoni-Razenshteyn,
STOC'15] and [Andoni-Indyk-Laarhoven-Razenshteyn-Schmidt, NIPS'15]. A
subpolynomial query time complexity can be achieved at the cost of a
space complexity of the order , matching the bound
of [Andoni-Indyk-Patrascu, FOCS'06] and
[Panigrahy-Talwar-Wieder, FOCS'10] and improving upon results of
[Indyk-Motwani, STOC'98] and [Kushilevitz-Ostrovsky-Rabani, STOC'98].
For large , minimizing the update complexity results in a query complexity
of , improving upon the related exponent for large of
[Kapralov, PODS'15] by a factor , and matching the bound
of [Panigrahy-Talwar-Wieder, FOCS'08]. Balancing the costs leads to optimal
complexities , while a minimum query time complexity can be
achieved with update complexity , improving upon the
previous best exponents of Kapralov by a factor .Comment: 16 pages, 1 table, 2 figures. Mostly subsumed by arXiv:1608.03580
[cs.DS] (along with arXiv:1605.02701 [cs.DS]
When Hashing Met Matching: Efficient Spatio-Temporal Search for Ridesharing
Carpooling, or sharing a ride with other passengers, holds immense potential
for urban transportation. Ridesharing platforms enable such sharing of rides
using real-time data. Finding ride matches in real-time at urban scale is a
difficult combinatorial optimization task and mostly heuristic approaches are
applied. In this work, we mathematically model the problem as that of finding
near-neighbors and devise a novel efficient spatio-temporal search algorithm
based on the theory of locality sensitive hashing for Maximum Inner Product
Search (MIPS). The proposed algorithm can find near-optimal potential
matches for every ride from a pool of rides in time and space for a small . Our
algorithm can be extended in several useful and interesting ways increasing its
practical appeal. Experiments with large NY yellow taxi trip datasets show that
our algorithm consistently outperforms state-of-the-art heuristic methods
thereby proving its practical applicability
Hashing for Similarity Search: A Survey
Similarity search (nearest neighbor search) is a problem of pursuing the data
items whose distances to a query item are the smallest from a large database.
Various methods have been developed to address this problem, and recently a lot
of efforts have been devoted to approximate search. In this paper, we present a
survey on one of the main solutions, hashing, which has been widely studied
since the pioneering work locality sensitive hashing. We divide the hashing
algorithms two main categories: locality sensitive hashing, which designs hash
functions without exploring the data distribution and learning to hash, which
learns hash functions according the data distribution, and review them from
various aspects, including hash function design and distance measure and search
scheme in the hash coding space
Scalable and Sustainable Deep Learning via Randomized Hashing
Current deep learning architectures are growing larger in order to learn from
complex datasets. These architectures require giant matrix multiplication
operations to train millions of parameters. Conversely, there is another
growing trend to bring deep learning to low-power, embedded devices. The matrix
operations, associated with both training and testing of deep networks, are
very expensive from a computational and energy standpoint. We present a novel
hashing based technique to drastically reduce the amount of computation needed
to train and test deep networks. Our approach combines recent ideas from
adaptive dropouts and randomized hashing for maximum inner product search to
select the nodes with the highest activation efficiently. Our new algorithm for
deep learning reduces the overall computational cost of forward and
back-propagation by operating on significantly fewer (sparse) nodes. As a
consequence, our algorithm uses only 5% of the total multiplications, while
keeping on average within 1% of the accuracy of the original model. A unique
property of the proposed hashing based back-propagation is that the updates are
always sparse. Due to the sparse gradient updates, our algorithm is ideally
suited for asynchronous and parallel training leading to near linear speedup
with increasing number of cores. We demonstrate the scalability and
sustainability (energy efficiency) of our proposed algorithm via rigorous
experimental evaluations on several real datasets
- …