157 research outputs found

    Implementing the postgreSQL query optimizer within the OPT++ framework

    Get PDF
    As a promising object-oriented reuse technology, frameworks have been attracting enough attention. However, much less work has been done on framework-based development, compared with on framework development. In this thesis we described our work, framework-based implementation of an optimizer for a relational database. We implemented our target optimizer based on a query optimization framework, OPT++. In order to borrow expertise from a mature optimizer, we studied a model optimizer, the PostgreSQL optimizer. The operators and algorithms supported by it are summarized. The transformative rules applied are extracted. Its search strategy is analyzed. In our application, typical operators for relational algebra are supported. Main transformative rules extracted from the PostgreSQL optimizer are applied. A PostgreSQL-like search strategy is implemented. Constrained dynamic programming and genetic algorithm are incorporated to optimize joins. Additionally, we modified the framework to fit sub-queries and explicit joins. We describe our implementation by following the framework recipes. Problems and implementation considerations are presented in detail. Furthermore, more general issues in framework-based development are discussed

    Analyzing Query Optimizer Performance in the Presence and Absence of Cardinality Estimates

    Full text link
    Most query optimizers rely on cardinality estimates to determine optimal execution plans. While traditional databases such as PostgreSQL, Oracle, and Db2 utilize many types of synopses -- including histograms, samples, and sketches -- recent main-memory databases like DuckDB and Heavy.AI often operate with minimal or no estimates, yet their performance does not necessarily suffer. To the best of our knowledge, no analytical comparison has been conducted between optimizers with and without cardinality estimates to understand their performance characteristics in different settings, such as indexed, non-indexed, and multi-threaded. In this paper, we present a comparative analysis between optimizers that use cardinality estimates and those that do not. We use the Join Order Benchmark (JOB) for our evaluation and true cardinalities as the baseline. Our investigation reveals that cardinality estimates have marginal impact in non-indexed settings. Meanwhile, when indexes are available, inaccurate estimates may lead to sub-optimal physical operators -- even with an optimal join order. Furthermore, the impact of cardinality estimates is less significant in highly-parallel main-memory databases

    Adaptive Execution of Compiled Queries

    Get PDF
    Compiling queries to machine code is arguably the most efficient way for executing queries. One often overlooked problem with compilation, however, is the time it takes to generate machine code. Even with fast compilation frameworks like LLVM, Generating machine code for complex queries routinely takes hundreds of milliseconds. Such compilation times can be a major disadvantage for workloads that execute many complex, but quick queries. To solve this problem, we propose an adaptive execution framework, which dynamically and transparently switches from interpretation to compilation. We also propose a fast bytecode interpreter for LLVM, which can execute queries without costly translation to machine code and thereby dramatically reduces query latency. Adaptive execution is dynamic, fine-grained, and can execute different code paths of the same query using different execution modes. Our extensive evaluation shows that this approach achieves optimal performance in a wide variety from settings---low latency for small data sets and maximum throughput for large data sizes

    Benchmarking Bottom-Up and Top-Down Strategies to Sparql-To-Sql Query Translation

    Get PDF
    Many researchers have proposed using conventional relational databases to store and query large Semantic Web datasets. The most complex component of this approach is SPARQL-to-SQL query translation. Existing algorithms perform this translation using either bottom-up or top-down strategy and result in semantically equivalent but syntactically different relational queries. Do relational query optimizers always produce identical query execution plans for semantically equivalent bottom-up and top-down queries? Which of the two strategies yields faster SQL queries? To address these questions, this work studies bottom-up and top-down translations of SPARQL queries with nested optional graph patterns. This work presents: (1) A basic graph pattern translation algorithm that yields flat SQL queries, (2) A bottom-up nested optional graph pattern translation algorithm, (3) A top-down nested optional graph pattern translation algorithm, and (4) A performance study featuring SPARQL queries with nested optional graph patterns over RDF databases created in Oracle, DB2, and PostgreSQL

    Should EU Land Use and Land Cover Data be managed with a NoSQL Document Store?

    Get PDF
    Land cover (LC) is a scientific landscape classification based on physical properties of earth materials. This information is usually retrieved through remote sensing techniques (e.g. forest cover, urban, clay content, among others). In contrast, Land use (LU) is defined from an anthropocentric point of view. It describes how a specific area is used (e.g. it is usual to indicate whether a territory supports an intensive, extensive use or it is unused). Both geospatial layers are essential inputs in many socio-economic and environmental studies. The INSPIRE directive provides technical data specifications for harmonization and sharing of voluminous LU/ LC datasets across all countries of the EU. The INSPIRE initiative proposes Object-Oriented Modelling as a data modelling methodology. However, the most used Geographic Information Systems (GIS) are built upon relational databases. This may jeopardize LU/LC data usability, since GIS practitioners will eventually face the object-relational impedance mismatch. In this paper, the authors introduce the SIOSE database (Spanish Land Cover and Land Use Information System), which was the first implementation of an object-oriented land cover and Land-use datamodel, in line with the recommendation of the INSPIRE Directive, separating both themes. SIOSE data can be downloaded as relational database files, where information describing each single LU/LC object is divided among several related tables, so database queries can be complex and time consuming. The authors show these technical complexities through a computational experience, comparing SQL and NoSQL databases for querying spatial data downloaded from SIOSE. Finally, the authors conclude that NoSQL geodatabases deserve to be further explored because they could scale for LU/LC data, both horizontally and vertically, better than relational geodatabases, improving usability and making the most of the EU harmonization efforts

    Optimizing complex queries with multiple relational instances

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Smooth Scan: Robust Query Execution with a Statistics-oblivious Access Operator

    Get PDF
    Query optimizers depend heavily on statistics representing column distributions to create efficient query plans. In many cases, though, statistics are outdated or non-existent, and the process of refreshing statistics is very expensive, especially for ad-hoc workloads on ever bigger data. This results in suboptimal plans that severely hurt performance. The main problem is that any decision, once made by the optimizer, is fixed throughout the execution of a query. In particular, each logical operator translates into a fixed choice of a physical operator at run-time. In this paper we advocate for continuous adaptation and morphing of physical operators throughout their lifetime, by adjusting their behavior in accordance with the statistical properties of the data. We demonstrate the benefits of the new paradigm by designing and implementing an adaptive access path operator called Smooth Scan, which morphs continuously within the space of traditional index access and full table scan. Smooth Scan behaves similarly to an index scan for low selectivity; if selectivity increases, however, Smooth Scan progressively morphs its behavior toward a sequential scan. As a result, a system with Smooth Scan requires no access path decisions up front nor does it need accurate statistics to provide good performance. We implement Smooth Scan in PostgreSQL and, using both synthetic benchmarks as well as TPC-H, we show that it achieves robust performance while at the same time being statistics-oblivious
    corecore