616 research outputs found
A storage and access architecture for efficient query processing in spatial database systems
Due to the high complexity of objects and queries and also due to extremely
large data volumes, geographic database systems impose stringent requirements on their
storage and access architecture with respect to efficient query processing. Performance
improving concepts such as spatial storage and access structures, approximations, object
decompositions and multi-phase query processing have been suggested and analyzed as
single building blocks. In this paper, we describe a storage and access architecture which
is composed from the above building blocks in a modular fashion. Additionally, we incorporate
into our architecture a new ingredient, the scene organization, for efficiently
supporting set-oriented access of large-area region queries. An experimental performance
comparison demonstrates that the concept of scene organization leads to considerable
performance improvements for large-area region queries by a factor of up to 150
Multi-Step Processing of Spatial Joins
Spatial joins are one of the most important operations for combining spatial objects of several relations. In this paper, spatial join processing is studied in detail for extended spatial objects in twodimensional data space. We present an approach for spatial join processing that is based on three steps. First, a spatial join is performed on the minimum bounding rectangles of the objects returning a set of candidates. Various approaches for accelerating this step of join processing have been examined at the last yearâs conference [BKS 93a]. In this paper, we focus on the problem how to compute the answers from the set of candidates which is handled by
the following two steps. First of all, sophisticated approximations
are used to identify answers as well as to filter out false hits from
the set of candidates. For this purpose, we investigate various types
of conservative and progressive approximations. In the last step, the
exact geometry of the remaining candidates has to be tested against
the join predicate. The time required for computing spatial join
predicates can essentially be reduced when objects are adequately
organized in main memory. In our approach, objects are first decomposed
into simple components which are exclusively organized
by a main-memory resident spatial data structure. Overall, we
present a complete approach of spatial join processing on complex
spatial objects. The performance of the individual steps of our approach
is evaluated with data sets from real cartographic applications.
The results show that our approach reduces the total execution
time of the spatial join by factors
Efficient Processing of Spatial Joins Using R-Trees
Abstract: In this paper, we show that spatial joins are very suitable to be processed on a parallel hardware platform. The parallel system is equipped with a so-called shared virtual memory which is well-suited for the design and implementation of parallel spatial join algorithms. We start with an algorithm that consists of three phases: task creation, task assignment and parallel task execu-tion. In order to reduce CPU- and I/O-cost, the three phases are processed in a fashion that pre-serves spatial locality. Dynamic load balancing is achieved by splitting tasks into smaller ones and reassigning some of the smaller tasks to idle processors. In an experimental performance compar-ison, we identify the advantages and disadvantages of several variants of our algorithm. The most efficient one shows an almost optimal speed-up under the assumption that the number of disks is sufficiently large. Topics: spatial database systems, parallel database systems
Querying Probabilistic Neighborhoods in Spatial Data Sets Efficiently
In this paper we define the notion
of a probabilistic neighborhood in spatial data: Let a set of points in
, a query point , a distance metric \dist,
and a monotonically decreasing function be
given. Then a point belongs to the probabilistic neighborhood of with respect to with probability f(\dist(p,q)). We envision
applications in facility location, sensor networks, and other scenarios where a
connection between two entities becomes less likely with increasing distance. A
straightforward query algorithm would determine a probabilistic neighborhood in
time by probing each point in .
To answer the query in sublinear time for the planar case, we augment a
quadtree suitably and design a corresponding query algorithm. Our theoretical
analysis shows that -- for certain distributions of planar -- our algorithm
answers a query in time with high probability
(whp). This matches up to a logarithmic factor the cost induced by
quadtree-based algorithms for deterministic queries and is asymptotically
faster than the straightforward approach whenever .
As practical proofs of concept we use two applications, one in the Euclidean
and one in the hyperbolic plane. In particular, our results yield the first
generator for random hyperbolic graphs with arbitrary temperatures in
subquadratic time. Moreover, our experimental data show the usefulness of our
algorithm even if the point distribution is unknown or not uniform: The running
time savings over the pairwise probing approach constitute at least one order
of magnitude already for a modest number of points and queries.Comment: The final publication is available at Springer via
http://dx.doi.org/10.1007/978-3-319-44543-4_3
Faster k-Medoids Clustering: Improving the PAM, CLARA, and CLARANS Algorithms
Clustering non-Euclidean data is difficult, and one of the most used
algorithms besides hierarchical clustering is the popular algorithm
Partitioning Around Medoids (PAM), also simply referred to as k-medoids. In
Euclidean geometry the mean-as used in k-means-is a good estimator for the
cluster center, but this does not hold for arbitrary dissimilarities. PAM uses
the medoid instead, the object with the smallest dissimilarity to all others in
the cluster. This notion of centrality can be used with any (dis-)similarity,
and thus is of high relevance to many domains such as biology that require the
use of Jaccard, Gower, or more complex distances.
A key issue with PAM is its high run time cost. We propose modifications to
the PAM algorithm to achieve an O(k)-fold speedup in the second SWAP phase of
the algorithm, but will still find the same results as the original PAM
algorithm. If we slightly relax the choice of swaps performed (at comparable
quality), we can further accelerate the algorithm by performing up to k swaps
in each iteration. With the substantially faster SWAP, we can now also explore
alternative strategies for choosing the initial medoids. We also show how the
CLARA and CLARANS algorithms benefit from these modifications. It can easily be
combined with earlier approaches to use PAM and CLARA on big data (some of
which use PAM as a subroutine, hence can immediately benefit from these
improvements), where the performance with high k becomes increasingly
important.
In experiments on real data with k=100, we observed a 200-fold speedup
compared to the original PAM SWAP algorithm, making PAM applicable to larger
data sets as long as we can afford to compute a distance matrix, and in
particular to higher k (at k=2, the new SWAP was only 1.5 times faster, as the
speedup is expected to increase with k)
How Do You Like Me in This: User Embodiment Preferences for Companion Agents
We investigate the relationship between the embodiment of an artificial companion and user perception and interaction with it. In a Wizard of Oz study, 42 users interacted with one of two embodiments: a physical robot or a virtual agent on a screen through a role-play of secretarial tasks in an office, with the companion providing essential assistance. Findings showed that participants in both condition groups when given the choice would prefer to interact with the robot companion, mainly for its greater physical or social presence. Subjects also found the robot less annoying and talked to it more naturally. However, this preference for the robotic embodiment is not reflected in the usersâ actual rating of the companion or their interaction with it. We reflect on this contradiction and conclude that in a task-based context a user focuses much more on a companionâs behaviour than its embodiment. This underlines the feasibility of our efforts in creating companions that migrate between embodiments while maintaining a consistent identity from the userâs point of view
Query processing of spatial objects: Complexity versus Redundancy
The management of complex spatial objects in applications, such as geography and cartography,
imposes stringent new requirements on spatial database systems, in particular on efficient
query processing. As shown before, the performance of spatial query processing can be improved
by decomposing complex spatial objects into simple components. Up to now, only decomposition
techniques generating a linear number of very simple components, e.g. triangles or trapezoids, have
been considered. In this paper, we will investigate the natural trade-off between the complexity of
the components and the redundancy, i.e. the number of components, with respect to its effect on
efficient query processing. In particular, we present two new decomposition methods generating
a better balance between the complexity and the number of components than previously known
techniques. We compare these new decomposition methods to the traditional undecomposed representation
as well as to the well-known decomposition into convex polygons with respect to their
performance in spatial query processing. This comparison points out that for a wide range of query
selectivity the new decomposition techniques clearly outperform both the undecomposed representation
and the convex decomposition method. More important than the absolute gain in performance
by a factor of up to an order of magnitude is the robust performance of our new decomposition
techniques over the whole range of query selectivity
- âŠ