722 research outputs found

    A Vision of a Decisional Model for Re-optimizing Query Execution Plans Based on Machine Learning Techniques

    Get PDF
    International audienceMany of the existing cloud database query optimization algorithms target reducing the monetary cost paid to cloud service providers in addition to query response time. These query optimization algorithms rely on an accurate cost estimation so that the optimal query execution plan (QEP) is selected. The cloud environment is dynamic, meaning the hardware configuration, data usage, and workload allocations are continuously changing. These dynamic changes make an accurate query cost estimation difficult to obtain. Concurrently, the query execution plan must be adjusted automatically to address these changes. In order to optimize the QEP with a more accurate cost estimation, the query needs to be optimized multiple times during execution. On top of this, the most updated estimation should be used for each optimization. However, issues arise when deciding to pause the execution for minimum overhead. In this paper, we present our vision of a method that uses machine learning techniques to predict the best timings for optimization during execution

    ReStore: Reusing Results of MapReduce Jobs

    Full text link
    Analyzing large scale data has emerged as an important activity for many organizations in the past few years. This large scale data analysis is facilitated by the MapReduce programming and execution model and its implementations, most notably Hadoop. Users of MapReduce often have analysis tasks that are too complex to express as individual MapReduce jobs. Instead, they use high-level query languages such as Pig, Hive, or Jaql to express their complex tasks. The compilers of these languages translate queries into workflows of MapReduce jobs. Each job in these workflows reads its input from the distributed file system used by the MapReduce system and produces output that is stored in this distributed file system and read as input by the next job in the workflow. The current practice is to delete these intermediate results from the distributed file system at the end of executing the workflow. One way to improve the performance of workflows of MapReduce jobs is to keep these intermediate results and reuse them for future workflows submitted to the system. In this paper, we present ReStore, a system that manages the storage and reuse of such intermediate results. ReStore can reuse the output of whole MapReduce jobs that are part of a workflow, and it can also create additional reuse opportunities by materializing and storing the output of query execution operators that are executed within a MapReduce job. We have implemented ReStore as an extension to the Pig dataflow system on top of Hadoop, and we experimentally demonstrate significant speedups on queries from the PigMix benchmark.Comment: VLDB201

    Robust Query Optimization Methods With Respect to Estimation Errors: A Survey

    Get PDF
    International audienceThe quality of a query execution plan chosen by a Cost-Based Optimizer (CBO) depends greatly on the estimation accuracy of input parameter values. Many research results have been produced on improving the estimation accuracy, but they do not work for every situation. Therefore, "robust query optimization" was introduced, in an effort to minimize the sub-optimality risk by accepting the fact that estimates could be inaccurate. In this survey, we aim to provide an overview of robust query optimization methods by classifying them into different categories, explaining the essential ideas, listing their advantages and limitations, and comparing them with multiple criteria

    Stethoscope: A platform for interactive visual analysis of query execution plans

    Get PDF
    Searching for the performance bottleneck in an execution trace is an error prone and time consuming activity. Existing tools oer some comfort by providing a visual representation of trace for analysis. In this paper we present the Stethoscope, an interactive visual tool to inspect and analyze columnar database query performance, both online and online. It's unique interactive animated interface capitalizes the large dataflow graph representation of a query execution plan, augmented with query execution trace information. We demonstrate features of Stethoscope for both online and online analysis of long running queries. It helps in understanding where time goes, how optimizers perform, and how parallel processing on multi-core systems is exploited

    SHRuB: searching through heuristics for the better query-execution plan

    Get PDF
    An important aspect to be considered for systems aiming at integrating similarity-queries into RDBMS is how to represent and optimize query-plans that involve traditional and complex predicates. Toward developing facilities for such integration, we developed a technique to extract a canonical queryplan command tree from an similarity-extended SQL expression. The SHRuB tool, presented in this paper, is able to interactively represent a query parsetree. We developed a catalog model which allows estimating the execution cost as well as provides hints for optimizing the query-plan by adopting a three stage heuristic. Through a case study and initial experiments, we have demonstrated that the tool is able to find a local-minimum query-execution plan. Moreover, SHRuB can be plugged on existing frameworks that support similarity queries or employed as a course-ware aid for database teaching.FAPESPCNPqCAPE
    corecore