11 research outputs found

    Parallel In-Memory Evaluation of Spatial Joins

    Full text link
    The spatial join is a popular operation in spatial database systems and its evaluation is a well-studied problem. As main memories become bigger and faster and commodity hardware supports parallel processing, there is a need to revamp classic join algorithms which have been designed for I/O-bound processing. In view of this, we study the in-memory and parallel evaluation of spatial joins, by re-designing a classic partitioning-based algorithm to consider alternative approaches for space partitioning. Our study shows that, compared to a straightforward implementation of the algorithm, our tuning can improve performance significantly. We also show how to select appropriate partitioning parameters based on data statistics, in order to tune the algorithm for the given join inputs. Our parallel implementation scales gracefully with the number of threads reducing the cost of the join to at most one second even for join inputs with tens of millions of rectangles.Comment: Extended version of the SIGSPATIAL'19 paper under the same titl

    Load-balanced Range Query Workload Partitioning for Compressed Spatial Hierarchical Bitmap (cSHB) Indexes

    Get PDF
    abstract: The spatial databases are used to store geometric objects such as points, lines, polygons. Querying such complex spatial objects becomes a challenging task. Index structures are used to improve the lookup performance of the stored objects in the databases, but traditional index structures cannot perform well in case of spatial databases. A significant amount of research is made to ingest, index and query the spatial objects based on different types of spatial queries, such as range, nearest neighbor, and join queries. Compressed Spatial Bitmap Index (cSHB) structure is one such example of indexing and querying approach that supports spatial range query workloads (set of queries). cSHB indexes and many other approaches lack parallel computation. The massive amount of spatial data requires a lot of computation and traditional methods are insufficient to address these issues. Other existing parallel processing approaches lack in load-balancing of parallel tasks which leads to resource overloading bottlenecks. In this thesis, I propose novel spatial partitioning techniques, Max Containment Clustering and Max Containment Clustering with Separation, to create load-balanced partitions of a range query workload. Each partition takes a similar amount of time to process the spatial queries and reduces the response latency by minimizing the disk access cost and optimizing the bitmap operations. The partitions created are processed in parallel using cSHB indexes. The proposed techniques utilize the block-based organization of bitmaps in the cSHB index and improve the performance of the cSHB index for processing a range query workload.Dissertation/ThesisMasters Thesis Computer Science 201

    An R*-Tree Based Semi-Dynamic Clustering Method for the Efficient Processing of Spatial Join in a Shared-Nothing Parallel Database System

    Get PDF
    The growing importance of geospatial databases has made it essential to perform complex spatial queries efficiently. To achieve acceptable performance levels, database systems have been increasingly required to make use of parallelism. The spatial join is a computationally expensive operator. Efficient implementation of the join operator is, thus, desirable. The work presented in this document attempts to improve the performance of spatial join queries by distributing the data set across several nodes of a cluster and executing queries across these nodes in parallel. This document discusses a new parallel algorithm that implements the spatial join in an efficient manner. This algorithm is compared to an existing parallel spatial-join algorithm, the clone join. Both algorithms have been implemented on a Beowulf cluster and compared using real datasets. An extensive experimental analysis reveals that the proposed algorithm exhibits superior performance both in declustering time as well as in the execution time of the join query

    Quantification of 3D spatial correlations between state variables and distances to the grain boundary network in full-field crystal plasticity spectral method simulations

    No full text
    Deformation microstructure heterogeneities play a pivotal role during dislocation patterning and interface network restructuring. Thus, they affect indirectly how an alloy recrystallizes if at all. Given this relevance, it has become common practice to study the evolution of deformation microstructure heterogeneities with 3D experiments and full-field crystal plasticity computer simulations including tools such as the spectral method. Quantifying material point to grain or phase boundary distances, though, is a practical challenge with spectral method crystal plasticity models because these discretize the material volume rather than mesh explicitly the grain and phase boundary interface network. This limitation calls for the development of interface reconstruction algorithms which enable us to develop specific data post-processing protocols to quantify spatial correlations between state variable values at each material point and the points' corresponding distance to the closest grain or phase boundary. This work contributes to advance such post-processing routines. Specifically, two grain reconstruction and three distancing methods are developed to solve above challenge. The individual strengths and limitations of these methods surplus the efficiency of their parallel implementation is assessed with an exemplary DAMASK large scale crystal plasticity study. We apply the new tool to assess the evolution of subtle stress and disorientation gradients towards grain boundaries.Comment: Manuscript submitted to Modelling and Simulation in Materials Science and Engineerin

    Distributed spatial query processing and optimization

    Get PDF
    x, 76 leaves ; 29 cmApplications exist today that require the management of distributed spatial data. Since spatial data is more complex than non-spatial data, performing queries on it requires more local processing (i.e. CPU and I/O) time. Also, due to geographical distribution, data transmission costs must be considered. To reduce these costs, one can employ a distributed spatial semijoin as it eliminates unnecessary objects before their transmission to other sites and the query site. Most existing work propose different representations of the distributed spatial semijoin between two sites only, with very few works exploring its use for processing a query involving more than two sites. In this thesis, we propose both new approaches for representing the spatial semijoin in a distributed setting, and their use for processing a distributed query consisting of any number of sites. Two strategies are proposed for compactly representing the spatial semijoin that reduce both the data transmission and local processing (CPU+I/O) costs when applied in a distributed spatial query. A Global Encompassing Minimum Bounding Rectangle (GEMBR) is utilized, which is partitioned, mapped and applied in two different ways to approximate the objects in a spatial joining attribute. The first is partition indices, while the second is a bit array representation. Then each spatial semijoin is applied in a multi-site distributed spatial query processing strategy. In addition, the two-site spatial semijoin is extended to handle multiple sites so that we have a benchmark strategy for comparison purposes. We have tested the query processing algorithms for four sites, which are a part of an actual working distributed system. The algorithms are compared with respect to data transmission cost, CPU time, I/O time and false positive results. The algorithms are superior in many cases at optimizing the above criteria. The bit array representation, which is called Bloom Filter Based Spatial Semijoin (BFSJ), is evaluated with respect to different filter factors and found that the optimized algorithms perform significantly better than the Distributed Na¨ıve Spatial Semijoin strategy when synthetic data was used. Also the Partition and Mapping Based Spatial Semijoin (PMSJ) is 1.38 times faster than BFSJ with respect to processing cost while the BFSJ has a tranmission cost gain of 1.12 over PMSJ. Both algorithms are 18 times faster and have six times less transmission cost than Distributed Na¨ıve Spatial Semijoin (NSPJ). Finally, it is also observed that with the increase of hash functions and filter factor the false positive percentage increases

    Efficient Parallel and Distributed Algorithms for GIS Polygon Overlay Processing

    Get PDF
    Polygon clipping is one of the complex operations in computational geometry. It is used in Geographic Information Systems (GIS), Computer Graphics, and VLSI CAD. For two polygons with n and m vertices, the number of intersections can be O(nm). In this dissertation, we present the first output-sensitive CREW PRAM algorithm, which can perform polygon clipping in O(log n) time using O(n + k + k\u27) processors, where n is the number of vertices, k is the number of intersections, and k\u27 is the additional temporary vertices introduced due to the partitioning of polygons. The current best algorithm by Karinthi, Srinivas, and Almasi does not handle self-intersecting polygons, is not output-sensitive and must employ O(n^2) processors to achieve O(log n) time. The second parallel algorithm is an output-sensitive PRAM algorithm based on Greiner-Hormann algorithm with O(log n) time complexity using O(n + k) processors. This is cost-optimal when compared to the time complexity of the best-known sequential plane-sweep based algorithm for polygon clipping. For self-intersecting polygons, the time complexity is O(((n + k) log n log log n)/p) using p In addition to these parallel algorithms, the other main contributions in this dissertation are 1) multi-core and many-core implementation for clipping a pair of polygons and 2) MPI-GIS and Hadoop Topology Suite for distributed polygon overlay using a cluster of nodes. Nvidia GPU and CUDA are used for the many-core implementation. The MPI based system achieves 44X speedup while processing about 600K polygons in two real-world GIS shapefiles 1) USA Detailed Water Bodies and 2) USA Block Group Boundaries) within 20 seconds on a 32-node (8 cores each) IBM iDataPlex cluster interconnected by InfiniBand technology

    Design and performance evaluation of indexing methods for dynamic attributes in mobile database management systems

    Get PDF
    Ankara : Department of Computer Engineering and Information Science and the Institute of Engineering and Science of Bilkent University, 1997.Thesis(Master's) -- Bilkent University, 1997.Includes bibliographical references leaves 99-104.Tayeb, JamelM.S
    corecore