29,276 research outputs found

    Efficient Large-scale Distance-Based Join Queries in SpatialHadoop

    Get PDF
    Efficient processing of Distance-Based Join Queries (DBJQs) in spatial databases is of paramount importance in many application domains. The most representative and known DBJQs are the K Closest Pairs Query (KCPQ) and the Δ Distance Join Query (ΔDJQ). These types of join queries are characterized by a number of desired pairs (K) or a distance threshold (Δ) between the components of the pairs in the final result, over two spatial datasets. Both are expensive operations, since two spatial datasets are combined with additional constraints. Given the increasing volume of spatial data originating from multiple sources and stored in distributed servers, it is not always efficient to perform DBJQs on a centralized server. For this reason, this paper addresses the problem of computing DBJQs on big spatial datasets in SpatialHadoop, an extension of Hadoop that supports efficient processing of spatial queries in a cloud-based setting. We propose novel algorithms, based on plane-sweep, to perform efficient parallel DBJQs on large-scale spatial datasets in Spatial Hadoop. We evaluate the performance of the proposed algorithms in several situations with large real-world as well as synthetic datasets. The experiments demonstrate the efficiency and scalability of our proposed methodologies

    Efficient query processing on large spatial databases A performance study

    Get PDF
    Processing of spatial queries has been studied extensively in the literature. In most cases, it is accomplished by indexing spatial data using spatial access methods. Spatial indexes, such as those based on the Quadtree, are important in spatial databases for efficient execution of queries involving spatial constraints and objects. In this paper, we study a recent balanced disk-based index structure for point data, called xBR + -tree, that belongs to the Quadtree family and hierarchically decomposes space in a regular manner. For the most common spatial queries, like Point Location, Window, Distance Range, Nearest Neighbor and Distance-based Join, the R-tree family is a very popular choice of spatial index, due to its excellent query performance. For this reason, we compare the performance of the xBR + -tree with respect to the R ∗ -tree and the R + -tree for tree building and processing the most studied spatial queries. To perform this comparison, we utilize existing algorithms and present new ones. We demonstrate through extensive experimental performance results (I/O efficiency and execution time), based on medium and large real and synthetic datasets, that the xBR + -tree is a big winner in execution time in all cases and a winner in I/O in most cases

    Constellation Queries over Big Data

    Full text link
    A geometrical pattern is a set of points with all pairwise distances (or, more generally, relative distances) specified. Finding matches to such patterns has applications to spatial data in seismic, astronomical, and transportation contexts. For example, a particularly interesting geometric pattern in astronomy is the Einstein cross, which is an astronomical phenomenon in which a single quasar is observed as four distinct sky objects (due to gravitational lensing) when captured by earth telescopes. Finding such crosses, as well as other geometric patterns, is a challenging problem as the potential number of sets of elements that compose shapes is exponentially large in the size of the dataset and the pattern. In this paper, we denote geometric patterns as constellation queries and propose algorithms to find them in large data applications. Our methods combine quadtrees, matrix multiplication, and unindexed join processing to discover sets of points that match a geometric pattern within some additive factor on the pairwise distances. Our distributed experiments show that the choice of composition algorithm (matrix multiplication or nested loops) depends on the freedom introduced in the query geometry through the distance additive factor. Three clearly identified blocks of threshold values guide the choice of the best composition algorithm. Finally, solving the problem for relative distances requires a novel continuous-to-discrete transformation. To the best of our knowledge this paper is the first to investigate constellation queries at scale

    Analysing Temporal Relations – Beyond Windows, Frames and Predicates

    Get PDF
    This article proposes an approach to rely on the standard operators of relational algebra (including grouping and ag- gregation) for processing complex event without requiring window specifications. In this way the approach can pro- cess complex event queries of the kind encountered in appli- cations such as emergency management in metro networks. This article presents Temporal Stream Algebra (TSA) which combines the operators of relational algebra with an analy- sis of temporal relations at compile time. This analysis de- termines which relational algebra queries can be evaluated against data streams, i. e. the analysis is able to distinguish valid from invalid stream queries. Furthermore the analysis derives functions similar to the pass, propagation and keep invariants in Tucker's et al. \Exploiting Punctuation Seman- tics in Continuous Data Streams". These functions enable the incremental evaluation of TSA queries, the propagation of punctuations, and garbage collection. The evaluation of TSA queries combines bulk-wise and out-of-order processing which makes it tolerant to workload bursts as they typically occur in emergency management. The approach has been conceived for efficiently processing complex event queries on top of a relational database system. It has been deployed and tested on MonetDB

    Adding Logical Operators to Tree Pattern Queries on Graph-Structured Data

    Full text link
    As data are increasingly modeled as graphs for expressing complex relationships, the tree pattern query on graph-structured data becomes an important type of queries in real-world applications. Most practical query languages, such as XQuery and SPARQL, support logical expressions using logical-AND/OR/NOT operators to define structural constraints of tree patterns. In this paper, (1) we propose generalized tree pattern queries (GTPQs) over graph-structured data, which fully support propositional logic of structural constraints. (2) We make a thorough study of fundamental problems including satisfiability, containment and minimization, and analyze the computational complexity and the decision procedures of these problems. (3) We propose a compact graph representation of intermediate results and a pruning approach to reduce the size of intermediate results and the number of join operations -- two factors that often impair the efficiency of traditional algorithms for evaluating tree pattern queries. (4) We present an efficient algorithm for evaluating GTPQs using 3-hop as the underlying reachability index. (5) Experiments on both real-life and synthetic data sets demonstrate the effectiveness and efficiency of our algorithm, from several times to orders of magnitude faster than state-of-the-art algorithms in terms of evaluation time, even for traditional tree pattern queries with only conjunctive operations.Comment: 16 page

    Temporal Stream Algebra

    Get PDF
    Data stream management systems (DSMS) so far focus on event queries and hardly consider combined queries to both data from event streams and from a database. However, applications like emergency management require combined data stream and database queries. Further requirements are the simultaneous use of multiple timestamps after different time lines and semantics, expressive temporal relations between multiple time-stamps and exible negation, grouping and aggregation which can be controlled, i. e. started and stopped, by events and are not limited to fixed-size time windows. Current DSMS hardly address these requirements. This article proposes Temporal Stream Algebra (TSA) so as to meet the afore mentioned requirements. Temporal streams are a common abstraction of data streams and data- base relations; the operators of TSA are generalizations of the usual operators of Relational Algebra. A in-depth 'analysis of temporal relations guarantees that valid TSA expressions are non-blocking, i. e. can be evaluated incrementally. In this respect TSA differs significantly from previous algebraic approaches which use specialized operators to prevent blocking expressions on a "syntactical" level
    • 

    corecore