7 research outputs found
Parallel In-Memory Evaluation of Spatial Joins
The spatial join is a popular operation in spatial database systems and its
evaluation is a well-studied problem. As main memories become bigger and faster
and commodity hardware supports parallel processing, there is a need to revamp
classic join algorithms which have been designed for I/O-bound processing. In
view of this, we study the in-memory and parallel evaluation of spatial joins,
by re-designing a classic partitioning-based algorithm to consider alternative
approaches for space partitioning. Our study shows that, compared to a
straightforward implementation of the algorithm, our tuning can improve
performance significantly. We also show how to select appropriate partitioning
parameters based on data statistics, in order to tune the algorithm for the
given join inputs. Our parallel implementation scales gracefully with the
number of threads reducing the cost of the join to at most one second even for
join inputs with tens of millions of rectangles.Comment: Extended version of the SIGSPATIAL'19 paper under the same titl
Enhancing In-Memory Spatial Indexing with Learned Search
Spatial data is ubiquitous. Massive amounts of data are generated every day from a plethora of sources such as billions of GPS-enableddevices (e.g., cell phones, cars, and sensors), consumer-based applications (e.g., Uber and Strava), and social media platforms (e.g.,location-tagged posts on Facebook, Twitter, and Instagram). This exponential growth in spatial data has led the research communityto build systems and applications for efficient spatial data processing.In this study, we apply a recently developed machine-learned search technique for single-dimensional sorted data to spatial indexing.Specifically, we partition spatial data using six traditional spatial partitioning techniques and employ machine-learned search withineach partition to support point, range, distance, and spatial join queries. Adhering to the latest research trends, we tune the partitioningtechniques to be instance-optimized. By tuning each partitioning technique for optimal performance, we demonstrate that: (i) grid-basedindex structures outperform tree-based index structures (from 1.23× to 2.47×), (ii) learning-enhanced variants of commonly used spatialindex structures outperform their original counterparts (from 1.44× to 53.34× faster), (iii) machine-learned search within a partitionis faster than binary search by 11.79% - 39.51% when filtering on one dimension, (iv) the benefit of machine-learned search diminishesin the presence of other compute-intensive operations (e.g. scan costs in higher selectivity queries, Haversine distance computation, andpoint-in-polygon tests), and (v) index lookup is the bottleneck for tree-based structures, which could potentially be reduced by linearizingthe indexed partitions.Additional Key Words and Phrases: spatial data, indexing, machine-learning, spatial queries, geospatia
SwiftSpatial: Spatial Joins on Modern Hardware
Spatial joins are among the most time-consuming queries in spatial data
management systems. In this paper, we propose SwiftSpatial, a specialized
accelerator architecture tailored for spatial joins. SwiftSpatial contains
multiple high-performance join units with innovative hybrid parallelism,
several efficient memory management units, and an integrated on-chip join
scheduler. We prototype SwiftSpatial on an FPGA and incorporate the R-tree
synchronous traversal algorithm as the control flow. Benchmarked against
various CPU and GPU-based spatial data processing systems, SwiftSpatial
demonstrates a latency reduction of up to 5.36x relative to the best-performing
baseline, while requiring 6.16x less power. The remarkable performance and
energy efficiency of SwiftSpatial lay a solid foundation for its future
integration into spatial data management systems, both in data centers and at
the edge
APRIL: Approximating Polygons as Raster Interval Lists
The spatial intersection join an important spatial query operation, due to
its popularity and high complexity. The spatial join pipeline takes as input
two collections of spatial objects (e.g., polygons). In the filter step, pairs
of object MBRs that intersect are identified and passed to the refinement step
for verification of the join predicate on the exact object geometries. The
bottleneck of spatial join evaluation is in the refinement step. We introduce
APRIL, a powerful intermediate step in the pipeline, which is based on raster
interval approximations of object geometries. Our technique applies a sequence
of interval joins on 'intervalized' object approximations to determine whether
the objects intersect or not. Compared to previous work, APRIL approximations
are simpler, occupy much less space, and achieve similar pruning effectiveness
at a much higher speed. Besides intersection joins between polygons, APRIL can
directly be applied and has high effectiveness for polygonal range queries,
within joins, and polygon-linestring joins. By applying a lightweight
compression technique, APRIL approximations may occupy even less space than
object MBRs. Furthermore, APRIL can be customized to apply on partitioned data
and on polygons of varying sizes, rasterized at different granularities. Our
last contribution is a novel algorithm that computes the APRIL approximation of
a polygon without having to rasterize it in full, which is orders of magnitude
faster than the computation of other raster approximations. Experiments on real
data demonstrate the effectiveness and efficiency of APRIL; compared to the
state-of-the-art intermediate filter, APRIL occupies 2x-8x less space, is
3.5x-8.5x more time-efficient, and reduces the end-to-end join cost up to 3
times.Comment: 12 page
The Complexity of Boolean Conjunctive Queries with Intersection Joins
Intersection joins over interval data are relevant in spatial and temporal
data settings. A set of intervals join if their intersection is non-empty. In
case of point intervals, the intersection join becomes the standard equality
join.
We establish the complexity of Boolean conjunctive queries with intersection
joins by a many-one equivalence to disjunctions of Boolean conjunctive queries
with equality joins. The complexity of any query with intersection joins is
that of the hardest query with equality joins in the disjunction exhibited by
our equivalence. This is captured by a new width measure called the IJ-width.
We also introduce a new syntactic notion of acyclicity called iota-acyclicity
to characterise the class of Boolean queries with intersection joins that admit
linear time computation modulo a poly-logarithmic factor in the data size.
Iota-acyclicity is for intersection joins what alpha-acyclicity is for equality
joins. It strictly sits between gamma-acyclicity and Berge-acyclicity. The
intersection join queries that are not iota-acyclic are at least as hard as the
Boolean triangle query with equality joins, which is widely considered not
computable in linear time
On the complexity of queries with intersection joins
This thesis studies the complexity of join processing on interval data. It defines a class of queries, called Conjunctive Queries with Intersections Joins (IJQs). An IJQ is a query in which the variables range both over scalars and intervals with real-valued endpoints. The joins are expressed through intersection predicates; an intersection predicate over a multi-set that consists of both scalars and intervals is a true assertion, if the elements in the multi-set intersect; otherwise, it is false. The class of IJQs includes queries that are often asked in practice.
This thesis introduces techniques for obtaining reductions from the problem of evaluating IJQs to the problem of evaluating Conjunctive Queries with Equality Joins (CQs). The key idea is the rewriting of an intersection predicate over a set of intervals into an equivalent predicate with equality conditions. This rewriting is achieved by building a segment tree where the nodes hierarchically encode intervals using bit-strings. Given a multi-set of intervals, their intersection is captured by certain equality conditions on the encoding of the nodes. Following that, it turns out that the problem of evaluating an IJQ on an input database containing intervals can be reduced to the problem of evaluating a union of CQs on a database containing scalars and vice versa. Such reductions lead to upper and lower bounds for the data complexity of Boolean IJQs, given upper and lower bounds for the data complexity Boolean CQs. The upper bounds are obtained using a reduction called forward reduction, which reduces any Boolean IJQ to a disjunction of Boolean CQs. The lower bounds are obtained by a reduction called backward reduction, in which any Boolean CQ from the aforementioned disjunction is reduced to the input Boolean IJQ. Overall, the two findings suggest that a Boolean IJQ is as difficult as the forward disjunctions' most difficult Boolean CQ. Last but not least, this thesis identifies an interesting subclass of Boolean IJQs that admit quasi-linear time computation in data complexity. They are referred to as -acyclic IJQs