5,288 research outputs found
On trip planning queries in spatial databases
In this paper we discuss a new type of query in Spatial Databases, called Trip Planning Query (TPQ). Given a set of points P in space, where each point belongs to a category, and given two points s and e, TPQ asks for the best trip that starts at s, passes through exactly one point from each category, and ends at e. An example of a TPQ is when a user wants to visit a set of different places and at the same time minimize the total travelling cost, e.g. what is the shortest travelling plan for me to visit an automobile shop, a CVS pharmacy outlet, and a Best Buy shop along my trip from A to B? The trip planning query is an extension of the well-known TSP problem and therefore is NP-hard. The difficulty of this query lies in the existence of multiple choices for each category. In this paper, we first study fast approximation algorithms for the trip planning query in a metric space, assuming that the data set fits in main memory, and give the theory analysis of their approximation bounds. Then, the trip planning query is examined for data sets that do not fit in main memory and must be stored on disk. For the disk-resident data, we consider two cases. In one case, we assume that the points are located in Euclidean space and indexed with an Rtree. In the other case, we consider the problem of points that lie on the edges of a spatial network (e.g. road network) and the distance between two points is defined using the shortest distance over the network. Finally, we give an experimental evaluation of the proposed algorithms using synthetic data sets generated on real road networks
On trip planning queries in spatial databases
In this paper we discuss a new type of query in Spatial Databases, called Trip Planning Query (TPQ). Given a set of points P in space, where each point belongs to a category, and given two points s and e, TPQ asks for the best trip that starts at s, passes through exactly one point from each category, and ends at e. An example of a TPQ is when a user wants to visit a set of different places and at the same time minimize the total travelling cost, e.g. what is the shortest travelling plan for me to visit an automobile shop, a CVS pharmacy outlet, and a Best Buy shop along my trip from A to B? The trip planning query is an extension of the well-known TSP problem and therefore is NP-hard. The difficulty of this query lies in the existence of multiple choices for each category. In this paper, we first study fast approximation algorithms for the trip planning query in a metric space, assuming that the data set fits in main memory, and give the theory analysis of their approximation bounds. Then, the trip planning query is examined for data sets that do not fit in main memory and must be stored on disk. For the disk-resident data, we consider two cases. In one case, we assume that the points are located in Euclidean space and indexed with an Rtree. In the other case, we consider the problem of points that lie on the edges of a spatial network (e.g. road network) and the distance between two points is defined using the shortest distance over the network. Finally, we give an experimental evaluation of the proposed algorithms using synthetic data sets generated on real road networks
Lower Bounds for Oblivious Near-Neighbor Search
We prove an lower bound on the dynamic
cell-probe complexity of statistically
approximate-near-neighbor search () over the -dimensional
Hamming cube. For the natural setting of , our result
implies an lower bound, which is a quadratic
improvement over the highest (non-oblivious) cell-probe lower bound for
. This is the first super-logarithmic
lower bound for against general (non black-box) data structures.
We also show that any oblivious data structure for
decomposable search problems (like ) can be obliviously dynamized
with overhead in update and query time, strengthening a classic
result of Bentley and Saxe (Algorithmica, 1980).Comment: 28 page
HD-Index: Pushing the Scalability-Accuracy Boundary for Approximate kNN Search in High-Dimensional Spaces
Nearest neighbor searching of large databases in high-dimensional spaces is
inherently difficult due to the curse of dimensionality. A flavor of
approximation is, therefore, necessary to practically solve the problem of
nearest neighbor search. In this paper, we propose a novel yet simple indexing
scheme, HD-Index, to solve the problem of approximate k-nearest neighbor
queries in massive high-dimensional databases. HD-Index consists of a set of
novel hierarchical structures called RDB-trees built on Hilbert keys of
database objects. The leaves of the RDB-trees store distances of database
objects to reference objects, thereby allowing efficient pruning using distance
filters. In addition to triangular inequality, we also use Ptolemaic inequality
to produce better lower bounds. Experiments on massive (up to billion scale)
high-dimensional (up to 1000+) datasets show that HD-Index is effective,
efficient, and scalable.Comment: PVLDB 11(8):906-919, 201
Efficient Analysis in Multimedia Databases
The rapid progress of digital technology has led to a situation
where computers have become ubiquitous tools. Now we can find them
in almost every environment, be it industrial or even private. With
ever increasing performance computers assumed more and more vital
tasks in engineering, climate and environmental research, medicine
and the content industry. Previously, these tasks could only be
accomplished by spending enormous amounts of time and money. By
using digital sensor devices, like earth observation satellites,
genome sequencers or video cameras, the amount and complexity of
data with a spatial or temporal relation has gown enormously. This
has led to new challenges for the data analysis and requires the use
of modern multimedia databases.
This thesis aims at developing efficient techniques for the analysis
of complex multimedia objects such as CAD data, time series and
videos. It is assumed that the data is modeled by commonly used
representations. For example CAD data is represented as a set of
voxels, audio and video data is represented as multi-represented,
multi-dimensional time series.
The main part of this thesis focuses on finding efficient methods
for collision queries of complex spatial objects. One way to speed
up those queries is to employ a cost-based decompositioning,
which uses interval groups to approximate a spatial object. For
example, this technique can be used for the Digital Mock-Up (DMU)
process, which helps engineers to ensure short product cycles. This
thesis defines and discusses a new similarity measure for time
series called threshold-similarity. Two time series are
considered similar if they expose a similar behavior regarding the
transgression of a given threshold value. Another part of the thesis
is concerned with the efficient calculation of reverse
k-nearest neighbor (RkNN) queries in general metric spaces
using conservative and progressive approximations. The aim of such
RkNN queries is to determine the impact of single objects on the
whole database. At the end, the thesis deals with video
retrieval and hierarchical genre classification of music
using multiple representations. The practical relevance of the
discussed genre classification approach is highlighted with a
prototype tool that helps the user to organize large music
collections.
Both the efficiency and the effectiveness of the presented
techniques are thoroughly analyzed. The benefits over traditional
approaches are shown by evaluating the new methods on real-world
test datasets
- ā¦