62,550 research outputs found

    SLA-Aware Cloud Query Processing with Reinforcement Learning-based Multi-Objective Re-Optimization

    Get PDF
    International audienceQuery processing on cloud database systems is a challenging problem due to the dynamic cloud environment. In cloud database systems, besides query execution time, users also consider the monetary cost to be paid to the cloud provider for executing queries. Moreover, a Service Level Agreement (SLA) is signed between users and cloud providers before any service is provided. Thus, from the profit-oriented perspective for the cloud providers, query re-optimization is multi-objective optimization that minimizes not only query execution time and monetary cost but also SLA violations. In this paper, we introduce ReOptRL and SLAReOptRL, two novel query re-optimization algorithms based on deep reinforcement learning. Experiments show that both algorithms improve query execution time and query execution monetary cost by 50% over existing algorithms, and SLAReOptRL has the lowest SLA violation rate among all the algorithms

    JanusAQP: Efficient Partition Tree Maintenance for Dynamic Approximate Query Processing

    Full text link
    Approximate query processing over dynamic databases, i.e., under insertions/deletions, has applications ranging from high-frequency trading to internet-of-things analytics. We present JanusAQP, a new dynamic AQP system, which supports SUM, COUNT, AVG, MIN, and MAX queries under insertions and deletions to the dataset. JanusAQP extends static partition tree synopses, which are hierarchical aggregations of datasets, into the dynamic setting. This paper contributes new methods for: (1) efficient initialization of the data synopsis in the presence of incoming data, (2) maintenance of the data synopsis under insertions/deletions, and (3) re-optimization of the partitioning to reduce the approximation error. JanusAQP reduces the error of a state-of-the-art baseline by more than 60% using only 10% storage cost. JanusAQP can process more than 100K updates per second in a single node setting and keep the query latency at a millisecond level

    Dynamic Ensemble Selection with Regional Expertise

    Get PDF
    Many recent works have shown that ensemble methods yield better generalizability over single classifier approach by aggregating the decisions of all base learners in machine learning tasks. To address the redundancy and inaccuracy issues with the base learners in ensemble methods, classifier/ensemble selection methods have been proposed to select one single classifier or an ensemble (a subset of all base learners) to classify a query pattern. This final classifier or ensemble is determined either statically before prediction or dynamically for every query pattern during prediction. Static selection approaches select classifier and ensemble by evaluating classifiers in terms of accuracy and diversity. While dynamic classifier/ensemble selection (DCS, DES) methods incorporate local information for a dedicated classifier/ensemble to each query pattern. Our work focuses on DES by proposing a new DES framework — DES with Regional Expertise (DES-RE). The success of a DES system lies in two factors: the quality of base learners and the optimality of ensemble selection. DES-RE proposed in our work addresses these two challenges respectively. 1) Local expertise enhancement. A novel data sampling and weighting strategy that combines the advantages of bagging and boosting is employed to increase the local expertise of the base learners in order to facilitate the later ensemble selection. 2) Competence region optimization. DES-RE tries to learn a distance metric to form better competence regions (aka neighborhood) that promote strong base learners with respect to a specific query pattern. In addition to perform local expertise enhancement and competence region optimization independently, we proposed an expectation–maximization (EM) framework that combines the two procedures. For all the proposed algorithms, extensive simulations are conducted to validate their performances

    Sampling-Based Query Re-Optimization

    Full text link
    Despite of decades of work, query optimizers still make mistakes on "difficult" queries because of bad cardinality estimates, often due to the interaction of multiple predicates and correlations in the data. In this paper, we propose a low-cost post-processing step that can take a plan produced by the optimizer, detect when it is likely to have made such a mistake, and take steps to fix it. Specifically, our solution is a sampling-based iterative procedure that requires almost no changes to the original query optimizer or query evaluation mechanism of the system. We show that this indeed imposes low overhead and catches cases where three widely used optimizers (PostgreSQL and two commercial systems) make large errors.Comment: This is the extended version of a paper with the same title and authors that appears in the Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD 2016

    The Odyssey Approach for Optimizing Federated SPARQL Queries

    Full text link
    Answering queries over a federation of SPARQL endpoints requires combining data from more than one data source. Optimizing queries in such scenarios is particularly challenging not only because of (i) the large variety of possible query execution plans that correctly answer the query but also because (ii) there is only limited access to statistics about schema and instance data of remote sources. To overcome these challenges, most federated query engines rely on heuristics to reduce the space of possible query execution plans or on dynamic programming strategies to produce optimal plans. Nevertheless, these plans may still exhibit a high number of intermediate results or high execution times because of heuristics and inaccurate cost estimations. In this paper, we present Odyssey, an approach that uses statistics that allow for a more accurate cost estimation for federated queries and therefore enables Odyssey to produce better query execution plans. Our experimental results show that Odyssey produces query execution plans that are better in terms of data transfer and execution time than state-of-the-art optimizers. Our experiments using the FedBench benchmark show execution time gains of at least 25 times on average.Comment: 16 pages, 10 figure

    Adaptive Energy-aware Scheduling of Dynamic Event Analytics across Edge and Cloud Resources

    Full text link
    The growing deployment of sensors as part of Internet of Things (IoT) is generating thousands of event streams. Complex Event Processing (CEP) queries offer a useful paradigm for rapid decision-making over such data sources. While often centralized in the Cloud, the deployment of capable edge devices on the field motivates the need for cooperative event analytics that span Edge and Cloud computing. Here, we identify a novel problem of query placement on edge and Cloud resources for dynamically arriving and departing analytic dataflows. We define this as an optimization problem to minimize the total makespan for all event analytics, while meeting energy and compute constraints of the resources. We propose 4 adaptive heuristics and 3 rebalancing strategies for such dynamic dataflows, and validate them using detailed simulations for 100 - 1000 edge devices and VMs. The results show that our heuristics offer O(seconds) planning time, give a valid and high quality solution in all cases, and reduce the number of query migrations. Furthermore, rebalance strategies when applied in these heuristics have significantly reduced the makespan by around 20 - 25%.Comment: 11 pages, 7 figure
    • …
    corecore