11 research outputs found
Parallel In-Memory Evaluation of Spatial Joins
The spatial join is a popular operation in spatial database systems and its
evaluation is a well-studied problem. As main memories become bigger and faster
and commodity hardware supports parallel processing, there is a need to revamp
classic join algorithms which have been designed for I/O-bound processing. In
view of this, we study the in-memory and parallel evaluation of spatial joins,
by re-designing a classic partitioning-based algorithm to consider alternative
approaches for space partitioning. Our study shows that, compared to a
straightforward implementation of the algorithm, our tuning can improve
performance significantly. We also show how to select appropriate partitioning
parameters based on data statistics, in order to tune the algorithm for the
given join inputs. Our parallel implementation scales gracefully with the
number of threads reducing the cost of the join to at most one second even for
join inputs with tens of millions of rectangles.Comment: Extended version of the SIGSPATIAL'19 paper under the same titl
Load-balanced Range Query Workload Partitioning for Compressed Spatial Hierarchical Bitmap (cSHB) Indexes
abstract: The spatial databases are used to store geometric objects such as points, lines, polygons. Querying such complex spatial objects becomes a challenging task. Index structures are used to improve the lookup performance of the stored objects in the databases, but traditional index structures cannot perform well in case of spatial databases. A significant amount of research is made to ingest, index and query the spatial objects based on different types of spatial queries, such as range, nearest neighbor, and join queries. Compressed Spatial Bitmap Index (cSHB) structure is one such example of indexing and querying approach that supports spatial range query workloads (set of queries). cSHB indexes and many other approaches lack parallel computation. The massive amount of spatial data requires a lot of computation and traditional methods are insufficient to address these issues. Other existing parallel processing approaches lack in load-balancing of parallel tasks which leads to resource overloading bottlenecks.
In this thesis, I propose novel spatial partitioning techniques, Max Containment Clustering and Max Containment Clustering with Separation, to create load-balanced partitions of a range query workload. Each partition takes a similar amount of time to process the spatial queries and reduces the response latency by minimizing the disk access cost and optimizing the bitmap operations. The partitions created are processed in parallel using cSHB indexes. The proposed techniques utilize the block-based organization of bitmaps in the cSHB index and improve the performance of the cSHB index for processing a range query workload.Dissertation/ThesisMasters Thesis Computer Science 201
An R*-Tree Based Semi-Dynamic Clustering Method for the Efficient Processing of Spatial Join in a Shared-Nothing Parallel Database System
The growing importance of geospatial databases has made it essential to perform complex spatial queries efficiently. To achieve acceptable performance levels, database systems have been increasingly required to make use of parallelism. The spatial join is a computationally expensive operator. Efficient implementation of the join operator is, thus, desirable. The work presented in this document attempts to improve the performance of spatial join queries by distributing the data set across several nodes of a cluster and executing queries across these nodes in parallel. This document discusses a new parallel algorithm that implements the spatial join in an efficient manner. This algorithm is compared to an existing parallel spatial-join algorithm, the clone join. Both algorithms have been implemented on a Beowulf cluster and compared using real datasets. An extensive experimental analysis reveals that the proposed algorithm exhibits superior performance both in declustering time as well as in the execution time of the join query
Quantification of 3D spatial correlations between state variables and distances to the grain boundary network in full-field crystal plasticity spectral method simulations
Deformation microstructure heterogeneities play a pivotal role during
dislocation patterning and interface network restructuring. Thus, they affect
indirectly how an alloy recrystallizes if at all. Given this relevance, it has
become common practice to study the evolution of deformation microstructure
heterogeneities with 3D experiments and full-field crystal plasticity computer
simulations including tools such as the spectral method.
Quantifying material point to grain or phase boundary distances, though, is a
practical challenge with spectral method crystal plasticity models because
these discretize the material volume rather than mesh explicitly the grain and
phase boundary interface network. This limitation calls for the development of
interface reconstruction algorithms which enable us to develop specific data
post-processing protocols to quantify spatial correlations between state
variable values at each material point and the points' corresponding distance
to the closest grain or phase boundary.
This work contributes to advance such post-processing routines. Specifically,
two grain reconstruction and three distancing methods are developed to solve
above challenge. The individual strengths and limitations of these methods
surplus the efficiency of their parallel implementation is assessed with an
exemplary DAMASK large scale crystal plasticity study. We apply the new tool to
assess the evolution of subtle stress and disorientation gradients towards
grain boundaries.Comment: Manuscript submitted to Modelling and Simulation in Materials Science
and Engineerin
Distributed spatial query processing and optimization
x, 76 leaves ; 29 cmApplications exist today that require the management of distributed spatial data. Since
spatial data is more complex than non-spatial data, performing queries on it requires more
local processing (i.e. CPU and I/O) time. Also, due to geographical distribution, data
transmission costs must be considered. To reduce these costs, one can employ a distributed
spatial semijoin as it eliminates unnecessary objects before their transmission to other sites
and the query site.
Most existing work propose different representations of the distributed spatial semijoin
between two sites only, with very few works exploring its use for processing a query
involving more than two sites. In this thesis, we propose both new approaches for representing
the spatial semijoin in a distributed setting, and their use for processing a distributed
query consisting of any number of sites. Two strategies are proposed for compactly representing
the spatial semijoin that reduce both the data transmission and local processing
(CPU+I/O) costs when applied in a distributed spatial query. A Global Encompassing Minimum
Bounding Rectangle (GEMBR) is utilized, which is partitioned, mapped and applied
in two different ways to approximate the objects in a spatial joining attribute. The first is
partition indices, while the second is a bit array representation. Then each spatial semijoin
is applied in a multi-site distributed spatial query processing strategy. In addition, the
two-site spatial semijoin is extended to handle multiple sites so that we have a benchmark
strategy for comparison purposes.
We have tested the query processing algorithms for four sites, which are a part of an
actual working distributed system. The algorithms are compared with respect to data transmission
cost, CPU time, I/O time and false positive results. The algorithms are superior in
many cases at optimizing the above criteria. The bit array representation, which is called Bloom Filter Based Spatial Semijoin (BFSJ), is evaluated with respect to different filter factors and found that the optimized algorithms perform significantly better than the Distributed
Na¨ıve Spatial Semijoin strategy when synthetic data was used. Also the Partition
and Mapping Based Spatial Semijoin (PMSJ) is 1.38 times faster than BFSJ with respect
to processing cost while the BFSJ has a tranmission cost gain of 1.12 over PMSJ. Both
algorithms are 18 times faster and have six times less transmission cost than Distributed
Na¨ıve Spatial Semijoin (NSPJ). Finally, it is also observed that with the increase of hash
functions and filter factor the false positive percentage increases
Efficient Parallel and Distributed Algorithms for GIS Polygon Overlay Processing
Polygon clipping is one of the complex operations in computational geometry. It is used in Geographic Information Systems (GIS), Computer Graphics, and VLSI CAD. For two polygons with n and m vertices, the number of intersections can be O(nm). In this dissertation, we present the first output-sensitive CREW PRAM algorithm, which can perform polygon clipping in O(log n) time using O(n + k + k\u27) processors, where n is the number of vertices, k is the number of intersections, and k\u27 is the additional temporary vertices introduced due to the partitioning of polygons. The current best algorithm by Karinthi, Srinivas, and Almasi does not handle self-intersecting polygons, is not output-sensitive and must employ O(n^2) processors to achieve O(log n) time. The second parallel algorithm is an output-sensitive PRAM algorithm based on Greiner-Hormann algorithm with O(log n) time complexity using O(n + k) processors. This is cost-optimal when compared to the time complexity of the best-known sequential plane-sweep based algorithm for polygon clipping. For self-intersecting polygons, the time complexity is O(((n + k) log n log log n)/p) using p
In addition to these parallel algorithms, the other main contributions in this dissertation are 1) multi-core and many-core implementation for clipping a pair of polygons and 2) MPI-GIS and Hadoop Topology Suite for distributed polygon overlay using a cluster of nodes. Nvidia GPU and CUDA are used for the many-core implementation. The MPI based system achieves 44X speedup while processing about 600K polygons in two real-world GIS shapefiles 1) USA Detailed Water Bodies and 2) USA Block Group Boundaries) within 20 seconds on a 32-node (8 cores each) IBM iDataPlex cluster interconnected by InfiniBand technology
Design and performance evaluation of indexing methods for dynamic attributes in mobile database management systems
Ankara : Department of Computer Engineering and Information Science and the Institute of Engineering and Science of Bilkent University, 1997.Thesis(Master's) -- Bilkent University, 1997.Includes bibliographical references leaves 99-104.Tayeb, JamelM.S