6,146 research outputs found
Analysis of approximate nearest neighbor searching with clustered point sets
We present an empirical analysis of data structures for approximate nearest
neighbor searching. We compare the well-known optimized kd-tree splitting
method against two alternative splitting methods. The first, called the
sliding-midpoint method, which attempts to balance the goals of producing
subdivision cells of bounded aspect ratio, while not producing any empty cells.
The second, called the minimum-ambiguity method is a query-based approach. In
addition to the data points, it is also given a training set of query points
for preprocessing. It employs a simple greedy algorithm to select the splitting
plane that minimizes the average amount of ambiguity in the choice of the
nearest neighbor for the training points. We provide an empirical analysis
comparing these two methods against the optimized kd-tree construction for a
number of synthetically generated data and query sets. We demonstrate that for
clustered data and query sets, these algorithms can provide significant
improvements over the standard kd-tree construction for approximate nearest
neighbor searching.Comment: 20 pages, 8 figures. Presented at ALENEX '99, Baltimore, MD, Jan
15-16, 199
Recommended from our members
Fully dynamic maintenance of Euclidean minimum spanning trees and maxima of decomposable functions
We maintain the minimum spanning tree of a point set in the plane, subject to point insertions and deletions, in time O(n^1/2 log^2 n) per update operation. We reduce the problem to maintaining bichromatic closest pairs, which we solve in time O(n^E) per update. Our algorithm uses a novel construction, the ordered nearest neighbors of a sequence of points. Any point set or bichromatic point set can be ordered so that this graph is a simple path. Our results generalize to higher dimensions, and to fully dynamic algorithms for maintaining maxima of decomposable functions, including the diameter of a point set and the bichromatic farthest pair
Recommended from our members
Fully dynamic maintenance of euclidean minimum spanning trees
We maintain the minimum spanning tree of a point set in the plane, subject to point insertions and deletions, in time O(n^5/6 log1^2/2 n) per update operation. No nontrivial dynamic geometric minimum spanning tree algorithm was previously known. We reduce the problem to maintaining bichromatic closest pairs, which we also solve in the same time bounds. Our algorithm uses a novel construction, the ordered nearest neighbors of a sequence of points. Any point set or bichromatic point set can be ordered so that this graph is a simple path
On trip planning queries in spatial databases
In this paper we discuss a new type of query in Spatial Databases, called Trip Planning Query (TPQ). Given a set of points P in space, where each point belongs to a category, and given two points s and e, TPQ asks for the best trip that starts at s, passes through exactly one point from each category, and ends at e. An example of a TPQ is when a user wants to visit a set of different places and at the same time minimize the total travelling cost, e.g. what is the shortest travelling plan for me to visit an automobile shop, a CVS pharmacy outlet, and a Best Buy shop along my trip from A to B? The trip planning query is an extension of the well-known TSP problem and therefore is NP-hard. The difficulty of this query lies in the existence of multiple choices for each category. In this paper, we first study fast approximation algorithms for the trip planning query in a metric space, assuming that the data set fits in main memory, and give the theory analysis of their approximation bounds. Then, the trip planning query is examined for data sets that do not fit in main memory and must be stored on disk. For the disk-resident data, we consider two cases. In one case, we assume that the points are located in Euclidean space and indexed with an Rtree. In the other case, we consider the problem of points that lie on the edges of a spatial network (e.g. road network) and the distance between two points is defined using the shortest distance over the network. Finally, we give an experimental evaluation of the proposed algorithms using synthetic data sets generated on real road networks
- …