111 research outputs found
Optimal Joins Using Compact Data Structures
Worst-case optimal join algorithms have gained a lot of attention in the database literature. We now count with several algorithms that are optimal in the worst case, and many of them have been implemented and validated in practice. However, the implementation of these algorithms often requires an enhanced indexing structure: to achieve optimality we either need to build completely new indexes, or we must populate the database with several instantiations of indexes such as B+-trees. Either way, this means spending an extra amount of storage space that may be non-negligible.
We show that optimal algorithms can be obtained directly from a representation that regards the relations as point sets in variable-dimensional grids, without the need of extra storage. Our representation is a compact quadtree for the static indexes, and a dynamic quadtree sharing subtrees (which we dub a qdag) for intermediate results. We develop a compositional algorithm to process full join queries under this representation, and show that the running time of this algorithm is worst-case optimal in data complexity. Remarkably, we can extend our framework to evaluate more expressive queries from relational algebra by introducing a lazy version of qdags (lqdags). Once again, we can show that the running time of our algorithms is worst-case optimal
Geo-Adaptive Deep Spatio-Temporal predictive modeling for human mobility
Deep learning approaches for spatio-temporal prediction problems such as
crowd-flow prediction assumes data to be of fixed and regular shaped tensor and
face challenges of handling irregular, sparse data tensor. This poses
limitations in use-case scenarios such as predicting visit counts of
individuals' for a given spatial area at a particular temporal resolution using
raster/image format representation of the geographical region, since the
movement patterns of an individual can be largely restricted and localized to a
certain part of the raster. Additionally, current deep-learning approaches for
solving such problem doesn't account for the geographical awareness of a region
while modelling the spatio-temporal movement patterns of an individual. To
address these limitations, there is a need to develop a novel strategy and
modeling approach that can handle both sparse, irregular data while
incorporating geo-awareness in the model. In this paper, we make use of
quadtree as the data structure for representing the image and introduce a novel
geo-aware enabled deep learning layer, GA-ConvLSTM that performs the
convolution operation based on a novel geo-aware module based on quadtree data
structure for incorporating spatial dependencies while maintaining the
recurrent mechanism for accounting for temporal dependencies. We present this
approach in the context of the problem of predicting spatial behaviors of an
individual (e.g., frequent visits to specific locations) through deep-learning
based predictive model, GADST-Predict. Experimental results on two GPS based
trace data shows that the proposed method is effective in handling frequency
visits over different use-cases with considerable high accuracy
Manycore processing of repeated range queries over massive moving objects observations
The ability to timely process significant amounts of continuously updated
spatial data is mandatory for an increasing number of applications. Parallelism
enables such applications to face this data-intensive challenge and allows the
devised systems to feature low latency and high scalability. In this paper we
focus on a specific data-intensive problem, concerning the repeated processing
of huge amounts of range queries over massive sets of moving objects, where the
spatial extents of queries and objects are continuously modified over time. To
tackle this problem and significantly accelerate query processing we devise a
hybrid CPU/GPU pipeline that compresses data output and save query processing
work. The devised system relies on an ad-hoc spatial index leading to a problem
decomposition that results in a set of independent data-parallel tasks. The
index is based on a point-region quadtree space decomposition and allows to
tackle effectively a broad range of spatial object distributions, even those
very skewed. Also, to deal with the architectural peculiarities and limitations
of the GPUs, we adopt non-trivial GPU data structures that avoid the need of
locked memory accesses and favour coalesced memory accesses, thus enhancing the
overall memory throughput. To the best of our knowledge this is the first work
that exploits GPUs to efficiently solve repeated range queries over massive
sets of continuously moving objects, characterized by highly skewed spatial
distributions. In comparison with state-of-the-art CPU-based implementations,
our method highlights significant speedups in the order of 14x-20x, depending
on the datasets, even when considering very cheap GPUs
A Study of Energy and Locality Effects using Space-filling Curves
The cost of energy is becoming an increasingly important driver for the
operating cost of HPC systems, adding yet another facet to the challenge of
producing efficient code. In this paper, we investigate the energy implications
of trading computation for locality using Hilbert and Morton space-filling
curves with dense matrix-matrix multiplication. The advantage of these curves
is that they exhibit an inherent tiling effect without requiring specific
architecture tuning. By accessing the matrices in the order determined by the
space-filling curves, we can trade computation for locality. The index
computation overhead of the Morton curve is found to be balanced against its
locality and energy efficiency, while the overhead of the Hilbert curve
outweighs its improvements on our test system.Comment: Proceedings of the 2014 IEEE International Parallel & Distributed
Processing Symposium Workshops (IPDPSW
Geographic Information Systems: The Developer\u27s Perspective
Geographic information systems, which manage data describing the surface of the earth, are becoming increasingly popular. This research details the current state of the art of geographic data processing in terms of the needs of the geographic information system developer. The research focuses chiefly on the geographic data model--the basic building block of the geographic information system. The two most popular models, tessellation and vector, are studied in detail, as well as a number of hybrid data models.
In addition, geographic database management is discussed in terms of geographic data access and query processing. Finally, a pragmatic discussion of geographic information system design is presented covering such topics as distributed database considerations and artificial intelligence considerations
- …