16,809 research outputs found
Analysis of approximate nearest neighbor searching with clustered point sets
We present an empirical analysis of data structures for approximate nearest
neighbor searching. We compare the well-known optimized kd-tree splitting
method against two alternative splitting methods. The first, called the
sliding-midpoint method, which attempts to balance the goals of producing
subdivision cells of bounded aspect ratio, while not producing any empty cells.
The second, called the minimum-ambiguity method is a query-based approach. In
addition to the data points, it is also given a training set of query points
for preprocessing. It employs a simple greedy algorithm to select the splitting
plane that minimizes the average amount of ambiguity in the choice of the
nearest neighbor for the training points. We provide an empirical analysis
comparing these two methods against the optimized kd-tree construction for a
number of synthetically generated data and query sets. We demonstrate that for
clustered data and query sets, these algorithms can provide significant
improvements over the standard kd-tree construction for approximate nearest
neighbor searching.Comment: 20 pages, 8 figures. Presented at ALENEX '99, Baltimore, MD, Jan
15-16, 199
Maximum Inner-Product Search using Tree Data-structures
The problem of {\em efficiently} finding the best match for a query in a
given set with respect to the Euclidean distance or the cosine similarity has
been extensively studied in literature. However, a closely related problem of
efficiently finding the best match with respect to the inner product has never
been explored in the general setting to the best of our knowledge. In this
paper we consider this general problem and contrast it with the existing
best-match algorithms. First, we propose a general branch-and-bound algorithm
using a tree data structure. Subsequently, we present a dual-tree algorithm for
the case where there are multiple queries. Finally we present a new data
structure for increasing the efficiency of the dual-tree algorithm. These
branch-and-bound algorithms involve novel bounds suited for the purpose of
best-matching with inner products. We evaluate our proposed algorithms on a
variety of data sets from various applications, and exhibit up to five orders
of magnitude improvement in query time over the naive search technique.Comment: Under submission in KDD 201
The Power of Dynamic Distance Oracles: Efficient Dynamic Algorithms for the Steiner Tree
In this paper we study the Steiner tree problem over a dynamic set of
terminals. We consider the model where we are given an -vertex graph
with positive real edge weights, and our goal is to maintain a tree
which is a good approximation of the minimum Steiner tree spanning a terminal
set , which changes over time. The changes applied to the
terminal set are either terminal additions (incremental scenario), terminal
removals (decremental scenario), or both (fully dynamic scenario). Our task
here is twofold. We want to support updates in sublinear time, and keep
the approximation factor of the algorithm as small as possible. We show that we
can maintain a -approximate Steiner tree of a general graph in
time per terminal addition or removal. Here,
denotes the stretch of the metric induced by . For planar graphs we achieve
the same running time and the approximation ratio of .
Moreover, we show faster algorithms for incremental and decremental scenarios.
Finally, we show that if we allow higher approximation ratio, even more
efficient algorithms are possible. In particular we show a polylogarithmic time
-approximate algorithm for planar graphs.
One of the main building blocks of our algorithms are dynamic distance
oracles for vertex-labeled graphs, which are of independent interest. We also
improve and use the online algorithms for the Steiner tree problem.Comment: Full version of the paper accepted to STOC'1
Linear-Space Approximate Distance Oracles for Planar, Bounded-Genus, and Minor-Free Graphs
A (1 + eps)-approximate distance oracle for a graph is a data structure that
supports approximate point-to-point shortest-path-distance queries. The most
relevant measures for a distance-oracle construction are: space, query time,
and preprocessing time. There are strong distance-oracle constructions known
for planar graphs (Thorup, JACM'04) and, subsequently, minor-excluded graphs
(Abraham and Gavoille, PODC'06). However, these require Omega(eps^{-1} n lg n)
space for n-node graphs. We argue that a very low space requirement is
essential. Since modern computer architectures involve hierarchical memory
(caches, primary memory, secondary memory), a high memory requirement in effect
may greatly increase the actual running time. Moreover, we would like data
structures that can be deployed on small mobile devices, such as handhelds,
which have relatively small primary memory. In this paper, for planar graphs,
bounded-genus graphs, and minor-excluded graphs we give distance-oracle
constructions that require only O(n) space. The big O hides only a fixed
constant, independent of \epsilon and independent of genus or size of an
excluded minor. The preprocessing times for our distance oracle are also faster
than those for the previously known constructions. For planar graphs, the
preprocessing time is O(n lg^2 n). However, our constructions have slower query
times. For planar graphs, the query time is O(eps^{-2} lg^2 n). For our
linear-space results, we can in fact ensure, for any delta > 0, that the space
required is only 1 + delta times the space required just to represent the graph
itself
On trip planning queries in spatial databases
In this paper we discuss a new type of query in Spatial Databases, called Trip Planning Query (TPQ). Given a set of points P in space, where each point belongs to a category, and given two points s and e, TPQ asks for the best trip that starts at s, passes through exactly one point from each category, and ends at e. An example of a TPQ is when a user wants to visit a set of different places and at the same time minimize the total travelling cost, e.g. what is the shortest travelling plan for me to visit an automobile shop, a CVS pharmacy outlet, and a Best Buy shop along my trip from A to B? The trip planning query is an extension of the well-known TSP problem and therefore is NP-hard. The difficulty of this query lies in the existence of multiple choices for each category. In this paper, we first study fast approximation algorithms for the trip planning query in a metric space, assuming that the data set fits in main memory, and give the theory analysis of their approximation bounds. Then, the trip planning query is examined for data sets that do not fit in main memory and must be stored on disk. For the disk-resident data, we consider two cases. In one case, we assume that the points are located in Euclidean space and indexed with an Rtree. In the other case, we consider the problem of points that lie on the edges of a spatial network (e.g. road network) and the distance between two points is defined using the shortest distance over the network. Finally, we give an experimental evaluation of the proposed algorithms using synthetic data sets generated on real road networks
- …