17,982 research outputs found
Maximum Inner-Product Search using Tree Data-structures
The problem of {\em efficiently} finding the best match for a query in a
given set with respect to the Euclidean distance or the cosine similarity has
been extensively studied in literature. However, a closely related problem of
efficiently finding the best match with respect to the inner product has never
been explored in the general setting to the best of our knowledge. In this
paper we consider this general problem and contrast it with the existing
best-match algorithms. First, we propose a general branch-and-bound algorithm
using a tree data structure. Subsequently, we present a dual-tree algorithm for
the case where there are multiple queries. Finally we present a new data
structure for increasing the efficiency of the dual-tree algorithm. These
branch-and-bound algorithms involve novel bounds suited for the purpose of
best-matching with inner products. We evaluate our proposed algorithms on a
variety of data sets from various applications, and exhibit up to five orders
of magnitude improvement in query time over the naive search technique.Comment: Under submission in KDD 201
Tradeoffs for nearest neighbors on the sphere
We consider tradeoffs between the query and update complexities for the
(approximate) nearest neighbor problem on the sphere, extending the recent
spherical filters to sparse regimes and generalizing the scheme and analysis to
account for different tradeoffs. In a nutshell, for the sparse regime the
tradeoff between the query complexity and update complexity
for data sets of size is given by the following equation in
terms of the approximation factor and the exponents and :
For small , minimizing the time for updates leads to a linear
space complexity at the cost of a query time complexity .
Balancing the query and update costs leads to optimal complexities
, matching bounds from [Andoni-Razenshteyn, 2015] and [Dubiner,
IEEE-TIT'10] and matching the asymptotic complexities of [Andoni-Razenshteyn,
STOC'15] and [Andoni-Indyk-Laarhoven-Razenshteyn-Schmidt, NIPS'15]. A
subpolynomial query time complexity can be achieved at the cost of a
space complexity of the order , matching the bound
of [Andoni-Indyk-Patrascu, FOCS'06] and
[Panigrahy-Talwar-Wieder, FOCS'10] and improving upon results of
[Indyk-Motwani, STOC'98] and [Kushilevitz-Ostrovsky-Rabani, STOC'98].
For large , minimizing the update complexity results in a query complexity
of , improving upon the related exponent for large of
[Kapralov, PODS'15] by a factor , and matching the bound
of [Panigrahy-Talwar-Wieder, FOCS'08]. Balancing the costs leads to optimal
complexities , while a minimum query time complexity can be
achieved with update complexity , improving upon the
previous best exponents of Kapralov by a factor .Comment: 16 pages, 1 table, 2 figures. Mostly subsumed by arXiv:1608.03580
[cs.DS] (along with arXiv:1605.02701 [cs.DS]
Scalable Image Retrieval by Sparse Product Quantization
Fast Approximate Nearest Neighbor (ANN) search technique for high-dimensional
feature indexing and retrieval is the crux of large-scale image retrieval. A
recent promising technique is Product Quantization, which attempts to index
high-dimensional image features by decomposing the feature space into a
Cartesian product of low dimensional subspaces and quantizing each of them
separately. Despite the promising results reported, their quantization approach
follows the typical hard assignment of traditional quantization methods, which
may result in large quantization errors and thus inferior search performance.
Unlike the existing approaches, in this paper, we propose a novel approach
called Sparse Product Quantization (SPQ) to encoding the high-dimensional
feature vectors into sparse representation. We optimize the sparse
representations of the feature vectors by minimizing their quantization errors,
making the resulting representation is essentially close to the original data
in practice. Experiments show that the proposed SPQ technique is not only able
to compress data, but also an effective encoding technique. We obtain
state-of-the-art results for ANN search on four public image datasets and the
promising results of content-based image retrieval further validate the
efficacy of our proposed method.Comment: 12 page
An Efficient Index for Visual Search in Appearance-based SLAM
Vector-quantization can be a computationally expensive step in visual
bag-of-words (BoW) search when the vocabulary is large. A BoW-based appearance
SLAM needs to tackle this problem for an efficient real-time operation. We
propose an effective method to speed up the vector-quantization process in
BoW-based visual SLAM. We employ a graph-based nearest neighbor search (GNNS)
algorithm to this aim, and experimentally show that it can outperform the
state-of-the-art. The graph-based search structure used in GNNS can efficiently
be integrated into the BoW model and the SLAM framework. The graph-based index,
which is a k-NN graph, is built over the vocabulary words and can be extracted
from the BoW's vocabulary construction procedure, by adding one iteration to
the k-means clustering, which adds small extra cost. Moreover, exploiting the
fact that images acquired for appearance-based SLAM are sequential, GNNS search
can be initiated judiciously which helps increase the speedup of the
quantization process considerably
Faster tuple lattice sieving using spherical locality-sensitive filters
To overcome the large memory requirement of classical lattice sieving
algorithms for solving hard lattice problems, Bai-Laarhoven-Stehl\'{e} [ANTS
2016] studied tuple lattice sieving, where tuples instead of pairs of lattice
vectors are combined to form shorter vectors. Herold-Kirshanova [PKC 2017]
recently improved upon their results for arbitrary tuple sizes, for example
showing that a triple sieve can solve the shortest vector problem (SVP) in
dimension in time , using a technique similar to
locality-sensitive hashing for finding nearest neighbors.
In this work, we generalize the spherical locality-sensitive filters of
Becker-Ducas-Gama-Laarhoven [SODA 2016] to obtain space-time tradeoffs for near
neighbor searching on dense data sets, and we apply these techniques to tuple
lattice sieving to obtain even better time complexities. For instance, our
triple sieve heuristically solves SVP in time . For
practical sieves based on Micciancio-Voulgaris' GaussSieve [SODA 2010], this
shows that a triple sieve uses less space and less time than the current best
near-linear space double sieve.Comment: 12 pages + references, 2 figures. Subsumed/merged into Cryptology
ePrint Archive 2017/228, available at https://ia.cr/2017/122
Reverse k Nearest Neighbor Search over Trajectories
GPS enables mobile devices to continuously provide new opportunities to
improve our daily lives. For example, the data collected in applications
created by Uber or Public Transport Authorities can be used to plan
transportation routes, estimate capacities, and proactively identify low
coverage areas. In this paper, we study a new kind of query-Reverse k Nearest
Neighbor Search over Trajectories (RkNNT), which can be used for route planning
and capacity estimation. Given a set of existing routes DR, a set of passenger
transitions DT, and a query route Q, a RkNNT query returns all transitions that
take Q as one of its k nearest travel routes. To solve the problem, we first
develop an index to handle dynamic trajectory updates, so that the most
up-to-date transition data are available for answering a RkNNT query. Then we
introduce a filter refinement framework for processing RkNNT queries using the
proposed indexes. Next, we show how to use RkNNT to solve the optimal route
planning problem MaxRkNNT (MinRkNNT), which is to search for the optimal route
from a start location to an end location that could attract the maximum (or
minimum) number of passengers based on a pre-defined travel distance threshold.
Experiments on real datasets demonstrate the efficiency and scalability of our
approaches. To the best of our best knowledge, this is the first work to study
the RkNNT problem for route planning.Comment: 12 page
- …