38 research outputs found
APRIL: Approximating Polygons as Raster Interval Lists
The spatial intersection join an important spatial query operation, due to
its popularity and high complexity. The spatial join pipeline takes as input
two collections of spatial objects (e.g., polygons). In the filter step, pairs
of object MBRs that intersect are identified and passed to the refinement step
for verification of the join predicate on the exact object geometries. The
bottleneck of spatial join evaluation is in the refinement step. We introduce
APRIL, a powerful intermediate step in the pipeline, which is based on raster
interval approximations of object geometries. Our technique applies a sequence
of interval joins on 'intervalized' object approximations to determine whether
the objects intersect or not. Compared to previous work, APRIL approximations
are simpler, occupy much less space, and achieve similar pruning effectiveness
at a much higher speed. Besides intersection joins between polygons, APRIL can
directly be applied and has high effectiveness for polygonal range queries,
within joins, and polygon-linestring joins. By applying a lightweight
compression technique, APRIL approximations may occupy even less space than
object MBRs. Furthermore, APRIL can be customized to apply on partitioned data
and on polygons of varying sizes, rasterized at different granularities. Our
last contribution is a novel algorithm that computes the APRIL approximation of
a polygon without having to rasterize it in full, which is orders of magnitude
faster than the computation of other raster approximations. Experiments on real
data demonstrate the effectiveness and efficiency of APRIL; compared to the
state-of-the-art intermediate filter, APRIL occupies 2x-8x less space, is
3.5x-8.5x more time-efficient, and reduces the end-to-end join cost up to 3
times.Comment: 12 page
A persistent homology model of street network connectivity
We propose a novel model of street network connectivity which uses a method from the field of applied topology entitled persistent homology. The output from this model is a pair of density functions which model the relative strength and frequency of connected components and cycles in the network. In this context, strength is a function of street type, such as motorway or residential, with more significant street types providing greater connectivity. The pair of density functions output from the model are easily interpreted and provide novel insights into the connectivity properties of different street networks. We demonstrate the usefulness of this model through an analysis of UK and USA city street networks. This analysis identifies tangible similarities and differences in the connectivity of different cities plus ways in which the connectivity of individual cities' might be improved
Hot Spot Analysis over Big Trajectory Data
Hot spot analysis is the problem of identifying statistically significant spatial clusters from an underlying data set. In this paper, we study the problem of hot spot analysis for massive trajectory data of moving objects, which has many real-life applications in different domains, especially in the analysis of vast repositories of historical traces of spatio-temporal data (cars, vessels, aircrafts). In order to identify hot spots, we propose an approach that relies on the Getis-Ord statistic, which has been used successfully in the past for point data. Since trajectory data is more than just a collection of individual points, we formulate the problem of trajectory hot spot analysis, using the Getis-Ord statistic. We propose a parallel and scalable algorithm for this problem, called THS, which provides an exact solution and can operate on vast-sized data sets. Moreover, we introduce an approximate algorithm (aTHS) that avoids exhaustive computation and trades-off accuracy for efficiency in a controlled manner. In essence, we provide a method that quantifies the maximum induced error in the approximation, in relation with the achieved computational savings. We develop our algorithms in Apache Spark and demonstrate the scalability and efficiency of our approach using a large, historical, real-life trajectory data set of vessels sailing in the Eastern Mediterranean for a period of three years.
Document type: Conference objec
Weiterentwicklung analytischer Datenbanksysteme
This thesis contributes to the state of the art in analytical database systems. First, we identify and explore extensions to better support analytics on event streams. Second, we propose a novel polygon index to enable efficient geospatial data processing in main memory. Third, we contribute a new deep learning approach to cardinality estimation, which is the core problem in cost-based query optimization.Diese Arbeit trägt zum aktuellen Forschungsstand von analytischen Datenbanksystemen bei. Wir identifizieren und explorieren Erweiterungen um Analysen auf Eventströmen besser zu unterstützen. Wir stellen eine neue Indexstruktur für Polygone vor, die eine effiziente Verarbeitung von Geodaten im Hauptspeicher ermöglicht. Zudem präsentieren wir einen neuen Ansatz für Kardinalitätsschätzungen mittels maschinellen Lernens
Recommended from our members
ADCN: An Anisotropic Density-Based Clustering Algorithm for Discovering Spatial Point Patterns with Noise
Density-based clustering algorithms such as DBSCAN have been widely used for spatial knowledge discovery as they offer several key advantages compared to other clustering algorithms. They can discover clusters with arbitrary shapes, are robust to noise and do not require prior knowledge (or estimation) of the number of clusters. The idea of using a scan circle centered at each point with a search radius Eps to find at least MinPts points as a criterion for deriving local density is easily understandable and sufficient for exploring isotropic spatial point patterns. However, there are many cases that cannot be adequately captured this way, particularly if they involve linear features or shapes with a continuously changing density such as a spiral. In such cases, DBSCAN tends to either create an increasing number of small clusters or add noise points into large clusters. Therefore, in this paper, we propose a novel anisotropic density-based clustering algorithm (ADCN). To motivate our work, we introduce synthetic and real-world cases that cannot be sufficiently handled by DBSCAN (and OPTICS). We then present our clustering algorithm and test it with a wide range of cases. We demonstrate that our algorithm can perform as equally well as DBSCAN in cases that do not explicitly benefit from an anisotropic perspective and that it outperforms DBSCAN in cases that do. We show that our approach has the same time complexity as DBSCAN and OPTICS, namely O(n log n) when using a spatial index and O(n 2 ) otherwise. We provide an implementation and test the runtime over multiple cases. Finally, we apply DBSCAN, OPTICS, and ADCN to the task of extracting urban areas of interest (AOI) from geotagged photos in six cities. Visual comparison shows that, comparing to DBSCAN and OPTICS, ADCN is inclined to extract AOIs with linear shapes which follow the underline road networks. ADCN also turns out to connect clusters when the spatial distribution of them shows similar directions
Fine-grained complexity and algorithm engineering of geometric similarity measures
Point sets and sequences are fundamental geometric objects that arise in any application that considers movement data, geometric shapes, and many more. A crucial task on these objects is to measure their similarity. Therefore, this thesis presents results on algorithms, complexity lower bounds, and algorithm engineering of the most important point set and sequence similarity measures like the Fréchet distance, the Fréchet distance under translation, and the Hausdorff distance under translation. As an extension to the mere computation of similarity, also the approximate near neighbor problem for the continuous Fréchet distance on time series is considered and matching upper and lower bounds are shown.Punktmengen und Sequenzen sind fundamentale geometrische Objekte, welche in vielen Anwendungen auftauchen, insbesondere in solchen die Bewegungsdaten, geometrische Formen, und ähnliche Daten verarbeiten. Ein wichtiger Bestandteil dieser Anwendungen ist die Berechnung der Ähnlichkeit von Objekten. Diese Dissertation präsentiert Resultate, genauer gesagt Algorithmen, untere Komplexitätsschranken und Algorithm Engineering der wichtigsten Ähnlichkeitsmaße für Punktmengen und Sequenzen, wie zum Beispiel Fréchetdistanz, Fréchetdistanz unter Translation und Hausdorffdistanz unter Translation. Als eine Erweiterung der bloßen Berechnung von Ähnlichkeit betrachten wir auch das Near Neighbor Problem für die kontinuierliche Fréchetdistanz auf Zeitfolgen und zeigen obere und untere Schranken dafür