11 research outputs found

    Self-tuning Query Mesh for Adaptive Multi-Route Query Processing

    Get PDF

    Multi-route query processing and optimization

    Get PDF
    A modern query optimizer typically picks a single query plan for all data based on overall data statistics. However, many have observed that real-life datasets tend to have non-uniform distributions. Selecting a single query plan may result in ineffective query execution for possibly large portions of the actual data. In addition most stream query processing systems, given the volume of data, cannot precisely model the system state much less account for uncertainty due to continuous variations. Such systems select a single query plan based upon imprecise statistics. In this paper, we present "Query Mesh" (or QM), a practical alternative to state-of-the-art data stream processing approaches. The main idea of QM is to compute multiple routes (i.e., query plans), each designed for a particular subset of the data with distinct statistical properties. We use terms "plans" and "routes" interchangeably in our work. A classifier model is induced and used to assign the best route to process incoming tuples based upon their data characteristics. We formulate the QM search space and analyze its complexity. Due to the substantial search space, we propose several cost-based query optimization heuristics designed to effectively find nearly optimal QMs. We propose the Self-Routing Fabric (SRF) infrastructure that supports query execution with multiple plans without physically constructing their topologies nor using a central router like Eddy. We also consider how to support uncertain route specification and execution in QM which can occur when imprecise statistics lead to more than one optimal route for a subset of data. Our experimental results indicate that QM consistently provides better query execution performance and incurs negligible overhead compared to the alternative state-of-the-art data stream approaches

    Continuous Query Processing on Spatio-Temporal Data Streams

    Get PDF
    This thesis addresses important challenges in the areas of streaming and spatio-temporal databases. It focuses on continuous querying of spatio-temporal environments characterized by (1) a large number of moving and stationary objects and queries; (2) need for near real-time results; (3) limited memory and cpu resources; and (4) different accuracy requirements. The first part of the thesis studies the problem of performance vs. accuracy tradeoff using different location modelling techniques when processing continuous spatio-temporal range queries on moving objects. Two models for modeling the movement, namely: continuous and discrete models are described. This thesis introduces an accuracy comparison model to estimate the quality of the answers returned by each of the models. Experimental evaluations show the effectiveness of each model given certain characteristics of spatio-temporal environment (e.g., varying speed, location update frequency). The second part of the thesis introduces SCUBA, a Scalable Cluster Based Algorithm for evaluating a large set of continuous queries over spatio-temporal data streams. Unlike the commonly used static grid indices, the key idea of SCUBA is to group moving objects and queries based on common dynamic properties (e.g., speed, destination, and road network location) at run-time into moving clusters. This results in improvement in performance which facilitate scalability. SCUBA exploits shared cluster-based execution consisting of two phases. In phase I, the evaluation of a set of spatio-temporal queries is abstracted as a spatial join between moving clusters for cluster-based filtering of true negatives. There after, in phase II, a fine-grained join process is executed for all pairs identified as potentially joinable by a positive cluster-join match in phase I. If the clusters don’t satisfy the join predicate, the objects and queries that belong to those clusters can be savely discarded as being guaranteed to not join individually either. This provides processing cost savings. Another advantage of SCUBA is that moving cluster-driven load shedding is facilitated. A moving cluster (or its subset, called nucleus)approximates the locations of its members. As a consequence relatively accurate answers can be produced using solely the abstracted cluster location information in place of precise object-by-object matches, resulting in savings in memory and improvement in processing time. A theoretical analysis of SCUBA is presented with respect to the memory requirements, number of join comparisons and I/O costs. Experimental evaluations on real datasets demonstrate that SCUBA achieves a substantial improvement when executing continuous queries on highly dense moving objects. The experiments are conducted in a real data streaming system (CAPE) developed at WPI on real datasets generated by the Network-Based Moving Objects Generator

    Self-tuning Query Mesh for Adaptive Multi-Route Query Processing

    Get PDF
    In real-life applications, different subsets of data may have distinct statistical properties, e.g., various websites may have diverse visitation rates, different categories of stocks may have dissimilar price fluctuation patterns. For such applications, it can be fruitful to eliminate the commonly made single execution plan assumption and instead execute a query using several plans, each optimally serving a subset of data with particular statistical properties. Furthermore, in dynamic environments, data properties may change continuously, thus calling for adaptivity. The intriguing question is: can we have an execution strategy that (1) is plan-based to leverage on all the benefits of traditional plan-based systems, (2) supports multiple plans each customized for different subset of data, and yet (3) is as adaptive as “plan-less ” systems like Eddies? While the recently proposed Query Mesh (QM) approach provides a foundation for such an execution paradigm, it does not address the question of adaptivity required for highly dynamic environments. In this work, we fill this gap by proposing a Self-Tuning Query Mesh (ST-QM) – an adaptive solution for content-based multi-plan execution engines. ST-QM addresses adaptive query processing by abstracting it as a concept drift problem – a wellknown subject in machine learning. Such abstraction allows to discard adaptivity candidates (i.e., the cases indicating a change in the environment) early in the process if they are insignificant or not “worthwhile ” to adapt to, and thus minimize the adaptivity overhead. A unique feature of our approach is that all logical transformations to the execution strategy get translated into a single inexpensive physical operation – the classifier change. Our experimental evaluation using a continuous query engine shows the performance benefits of ST-QM approach over the alternatives, namely the non-adaptive and the Eddies-based solutions

    Scuba: Scalable cluster-based algorithm for evaluating continuous spatio-temporal queries on moving objects

    No full text
    Abstract. In this paper, we propose, SCUBA, a Scalable Cluster Based Algorithm for evaluating a large set of continuous queries over spatiotemporal data streams. The key idea of SCUBA is to group moving objects and queries based on common spatio-temporal properties at runtime into moving clusters to optimize query execution and thus facilitate scalability. SCUBA exploits shared cluster-based execution by abstracting the evaluation of a set of spatio-temporal queries as a spatial join first between moving clusters. This cluster-based filtering prunes true negatives. Then the execution proceeds with a fine-grained withinmoving-cluster join process for all pairs of moving clusters identified as potentially joinable by a positive cluster-join match. A moving cluster can serve as an approximation of the location of its members. We show how moving clusters can serve as means for intelligent load shedding of spatio-temporal data to avoid performance degradation with minimal harm to result quality. Our experiments on real datasets demonstrate that SCUBA can achieve a substantial improvement when executing continuous queries on spatio-temporal data streams.

    Query Mesh: An Efficient Multi-Route Approach to Query Optimization

    Get PDF
    In most database systems, traditional and stream systems alike, the optimizer picks a single query plan for all data based on the overall statistics of the data. It has however been repeatedly observed that real-life datasets are non-uniform. Selecting a single execution plan may result in a query execution that is ineffective for possibly large portions of the actual data. In this paper, we present a practical alternative to the current state-of-the-art query optimization techniques, termed a multiroute query mesh model (or short QM). The main idea of QM is to compute multiple routes (query plans), each designed for a particular subset of data with distinct statistical properties. Based on the execution routes and the data characteristics, a classifier model is induced. The classifier is used for efficient partitioning of the new data to assign the best route for query processing. We formulate the QM search space and analyze its complexity. To find optimal query meshes, we design the Opt-QM algorithm. Faced with a dilemma – whether to determine distinct data subsets or to compute a set of execution routes first, we design several heuristics that can effectively find good quality query meshes very efficiently. For runtime query processing, we employ a Self-Routing Fabric (SRF) infrastructure which supports shared operator processing and has near-zero routing overhead. Results of our experimental study with real-life and synthetic data indicate that QM-based approach consistently provides better query execution performance for skewed datasets compared to the state-of-the-art alternatives, namely both the traditional systems that employ a single pre-computed plan execution and also the systems that determine different routes on-the-fly

    ClusterSheddy: Load Shedding Using Moving Clusters over Spatio-temporal Data Streams

    No full text
    Abstract. Moving object environments are characterized by large numbers of objects continuously sending location updates. At times, data arrival rates may spike up, causing the load on the system to exceed its capacity. This may result in increased output latencies, potentially leading to invalid or obsolete answers. Dropping data randomly, the most frequently used approach in the literature for load shedding, may adversely affect the accuracy of the results. We thus propose a load shedding technique customized for spatio-temporal stream data. In our model, spatiotemporal properties, such as location, time, direction and speed over time, serve as critical factors in the load shedding decision. The main idea is to abstract similarly moving objects into moving clusters which serve as summaries of their members ’ movement. Based on resource restrictions, members within clusters may be selectively discarded, while their locations are being approximated by their respective moving clusters. Our experimental study illustrates the performance gains achieved by our load-shedding framework and the tradeoff between the amount of data shed and the result accuracy.

    GSLPI: A cost-based query progress indicator

    No full text
    Abstract—Progress indicators for SQL queries were first published in 2004 with the simultaneous and independent proposals from Chaudhuri et al. and Luo et al. In this paper, we implement both progress indicators in the same commercial RDBMS to investigate their performance. We summarize common cases in which they are both accurate and cases in which they fail to provide reliable estimates. Although there are differences in their performance, much more striking is the similarity in the errors they make due to a common simplifying uniform future speed assumption. While the developers of these progress indicators were aware that this assumption could cause errors, they neither explored how large the errors might be nor did they investigate the feasibility of removing the assumption. To rectify this we propose a new query progress indicator, similar to these early progress indicators but without the uniform speed assumption. Experiments show that on the TPC-H benchmark, on queries for which the original progress indicators have errors up to 30X the query running time, the new progress indicator is accurate to within 10 percent. We also discuss the sources of the errors that still remain and shed some light on what would need to be done to eliminate them. I
    corecore