28 research outputs found
Dynamic programming strikes back
Two highly efficient algorithms are known for optimally ordering joins while avoiding cross products: DPccp, which is based on dynamic programming, and Top-Down Partition Search, based on memoization. Both have two severe limitations: They handle only (1) simple (binary) join predicates and (2) inner joins. However, real queries may contain complex join predicates, involving more than two relations, and outer joins as well as other non-inner joins. Taking the most efficient known join-ordering algorithm, DPccp, as a starting point, we first develop a new algorithm, DPhyp, which is capable to handle complex join predicates efficiently. We do so by modeling the query graph as a (variant of a) hypergraph and then reason about its connected subgraphs. Then, we present a technique to exploit this capability to efficiently handle the widest class of non-inner joins dealt with so far. Our experimental results show that this reformulation of non-inner joins as complex predicates can improve optimization time by orders of magnitude, compared to known algorithms dealing with complex join predicates and non-inner joins. Once again, this gives dynamic programming a distinct advantage over current memoization techniques
TPC-H Analyzed: Hidden Messages and Lessons Learned from an Influential Benchmark
The TPC-D benchmark was developed almost 20 years ago, and even though its current existence as TPC H could be considered superseded by TPC-DS, one can still learn from it. We focus on the technical level, summarizing the challenges posed by the TPC-H workload as we now understand them, which w
Cypher: An Evolving Query Language for Property Graphs
International audienceThe Cypher property graph query language is an evolving language, originally designed and implemented as part of the Neo4j graph database, and it is currently used by several commercial database products and researchers. We describe Cypher 9, which is the first version of the language governed by the openCypher Implementers Group. We first introduce the language by example, and describe its uses in industry. We then provide a formal semantic definition of the core read-query features of Cypher, including its variant of the property graph data model, and its " ASCII Art " graph pattern matching mechanism for expressing subgraphs of interest to an application. We compare the features of Cypher to other property graph query languages, and describe extensions, at an advanced stage of development, which will form part of Cypher 10, turning the language into a compositional language which supports graph projections and multiple named graphs
How Good Are Query Optimizers, Really?
Finding a good join order is crucial for query performance. In this paper, we introduce the Join Order Benchmark (JOB) and experimentally revisi
ΠΠ± ΡΡΡΠ΅ΠΊΡΠΈΠ²Π½ΠΎΡΡΠΈ ΠΌΠΈΠ½ΠΈΠΌΠΈΠ·ΠΈΡΡΡΡΠ΅Π³ΠΎ ΠΏΠΎΠ΄Ρ ΠΎΠ΄Π° ΠΊ ΠΎΠΏΡΠΈΠΌΠΈΠ·Π°ΡΠΈΠΈ Π·Π°ΠΏΡΠΎΡΠΎΠ²
A standard problem of DBMSs usage is a lack of efficiency and high cost of the access to the stored data. The acceptable level of system performance may be achieved by query optimization technics that determine the most efficient way to execute a given query by its modification and considering possible query execution plans. The goal of this paper is to prove the efficiency of the query minimization algorithms based on minimization of the query restriction by elimination of the redundant conditions. The paper represents minimization algorithms based on the mathematical transformations, which detect and remove redundant conditions from query restriction to simplify it. It includes minimization algorithms based on βcondition absorptionβ, prime implicants, and a set of linear inequalities minimization technics. The paper also includes theoretical justification of the efficiency of minimization approach to the query optimization based on restriction simplification. We also observe experimental results of the implementation of these optimization techniques and their influence on the query processing speed. In the end, we represent an observation of the query minimization impact on the whole optimization processΒ Π‘ΡΠ°Π½Π΄Π°ΡΡΠ½ΠΎΠΈΜ ΠΏΡΠΎΠ±Π»Π΅ΠΌΠΎΠΈΜ ΠΈΡΠΏΠΎΠ»ΡΠ·ΠΎΠ²Π°Π½ΠΈΡ Π‘Π£ΠΠ ΡΠ²Π»ΡΠ΅ΡΡΡ Π½Π΅Π΄ΠΎΡΡΠ°ΡΠΎΠΊ ΡΡΡΠ΅ΠΊΡΠΈΠ²Π½ΠΎΡΡΠΈ ΠΈ Π²ΡΡΠΎΠΊΠ°Ρ ΡΡΠΎΠΈΠΌΠΎΡΡΡ Π΄ΠΎΡΡΡΠΏΠ° ΠΊ Ρ
ΡΠ°Π½ΠΈΠΌΡΠΌ Π΄Π°Π½Π½ΡΠΌ. ΠΠΎΠΏΡΡΡΠΈΠΌΡΠΈΜ ΡΡΠΎΠ²Π΅Π½Ρ ΡΠ°Π±ΠΎΡΡ ΡΠΈΡΡΠ΅ΠΌΡ ΠΌΠΎΠΆΠ΅Ρ Π΄ΠΎΡΡΠΈΠ³Π°ΡΡΡΡ Ρ ΠΏΠΎΠΌΠΎΡΡΡ ΡΠ΅Ρ
Π½ΠΎΠ»ΠΎΠ³ΠΈΠΈΜ ΠΎΠΏΡΠΈΠΌΠΈΠ·Π°ΡΠΈΠΈ Π·Π°ΠΏΡΠΎΡΠΎΠ², ΠΎΠΏΡΠ΅Π΄Π΅Π»ΡΡΡΠΈΡ
Π½Π°ΠΈΠ±ΠΎΠ»Π΅Π΅ ΡΡΡΠ΅ΠΊΡΠΈΠ²Π½ΡΠΈΜ ΡΠΏΠΎΡΠΎΠ± Π²ΡΠΏΠΎΠ»Π½Π΅Π½ΠΈΡ ΠΊΠΎΠ½ΠΊΡΠ΅ΡΠ½ΠΎΠ³ΠΎ Π·Π°ΠΏΡΠΎΡΠ° Ρ ΠΏΠΎΠΌΠΎΡΡΡ Π΅Π³ΠΎ ΠΌΠΎΠ΄ΠΈΡΠΈΠΊΠ°ΡΠΈΠΈ ΠΈ ΠΎΠΏΡΠ΅Π΄Π΅Π»Π΅Π½ΠΈΡ Π²ΠΎΠ·ΠΌΠΎΠΆΠ½ΡΡ
ΠΏΠ»Π°Π½ΠΎΠ² Π²ΡΠΏΠΎΠ»Π½Π΅Π½ΠΈΡ.Β Π¦Π΅Π»ΡΡ Π΄Π°Π½Π½ΠΎΠΈΜ ΡΠ°Π±ΠΎΡΡ ΡΠ²Π»ΡΠ΅ΡΡΡ Π΄ΠΎΠΊΠ°Π·Π°ΡΠ΅Π»ΡΡΡΠ²ΠΎ ΡΡΡΠ΅ΠΊΡΠΈΠ²Π½ΠΎΡΡΠΈ Π°Π»Π³ΠΎΡΠΈΡΠΌΠΎΠ² ΠΌΠΈΠ½ΠΈΠΌΠΈΠ·Π°ΡΠΈΠΈ Π·Π°ΠΏΡΠΎΡΠ°, ΠΎΡΠ½ΠΎΠ²Π°Π½Π½ΡΡ
Π½Π° ΠΌΠΈΠ½ΠΈΠΌΠΈΠ·Π°ΡΠΈΠΈ ΠΎΠ³ΡΠ°Π½ΠΈΡΠ΅Π½ΠΈΡ Π·Π°ΠΏΡΠΎΡΠ° ΠΈ ΡΠ΄Π°Π»Π΅Π½ΠΈΡ ΠΈΠ·Π±ΡΡΠΎΡΠ½ΡΡ
ΡΡΠ»ΠΎΠ²ΠΈΠΈΜ.Β Π‘ΡΠ°ΡΡΡ ΠΏΡΠ΅Π΄ΡΡΠ°Π²Π»ΡΠ΅Ρ Π°Π»Π³ΠΎΡΠΈΡΠΌΡ ΠΌΠΈΠ½ΠΈΠΌΠΈΠ·Π°ΡΠΈΠΈ, ΠΎΡΠ½ΠΎΠ²Π°Π½Π½ΡΠ΅ Π½Π° ΠΌΠ°ΡΠ΅ΠΌΠ°ΡΠΈΡΠ΅ΡΠΊΠΈΡ
ΠΏΡΠ΅ΠΎΠ±ΡΠ°Π·ΠΎΠ²Π°Π½ΠΈΡΡ
, ΠΎΠΏΡΠ΅Π΄Π΅Π»ΡΡΡΠΈΡ
ΠΈ ΡΠ΄Π°Π»ΡΡΡΠΈΡ
ΠΈΠ·Π±ΡΡΠΎΡΠ½ΡΠ΅ ΡΡΠ»ΠΎΠ²ΠΈΡ ΠΈΠ· ΠΎΠ³ΡΠ°Π½ΠΈΡΠ΅Π½ΠΈΡ Π·Π°ΠΏΡΠΎΡΠ°, ΡΡΠΎΠ±Ρ ΡΠΏΡΠΎΡΡΠΈΡΡ Π΅Π³ΠΎ. ΠΠ½Π° Π²ΠΊΠ»ΡΡΠ°Π΅Ρ Π°Π»Π³ΠΎΡΠΈΡΠΌΡ, ΠΎΡΠ½ΠΎΠ²Π°Π½Π½ΡΠ΅ Π½Π° ΡΠ΅Ρ
Π½ΠΎΠ»ΠΎΠ³ΠΈΡΡ
Β«ΠΏΠΎΠ³Π»ΠΎΡΠ΅Π½ΠΈΡ ΡΡΠ»ΠΎΠ²ΠΈΠΈΜΒ», ΠΏΠ΅ΡΠ²ΠΈΡΠ½ΡΡ
ΠΈΠΌΠΏΠ»ΠΈΠΊΠ°Π½Ρ ΠΈ ΠΌΠΈΠ½ΠΈΠΌΠΈΠ·Π°ΡΠΈΠΈ ΠΌΠ½ΠΎΠΆΠ΅ΡΡΠ² Π»ΠΈΠ½Π΅ΠΈΜΠ½ΡΡ
Π½Π΅ΡΠ°Π²Π΅Π½ΡΡΠ².Β Π Π°Π±ΠΎΡΠ° ΡΠ°ΠΊΠΆΠ΅ Π²ΠΊΠ»ΡΡΠ°Π΅Ρ ΡΠ΅ΠΎΡΠ΅ΡΠΈΡΠ΅ΡΠΊΠΎΠ΅ Π΄ΠΎΠΊΠ°Π·Π°ΡΠ΅Π»ΡΡΡΠ²ΠΎ ΡΡΡΠ΅ΠΊΡΠΈΠ²Π½ΠΎΡΡΠΈ ΠΌΠΈΠ½ΠΈΠΌΠΈΠ·ΠΈΡΡΡΡΠ΅Π³ΠΎ ΠΏΠΎΠ΄Ρ
ΠΎΠ΄Π°, ΠΎΡΠ½ΠΎΠ²Π°Π½Π½ΠΎΠ³ΠΎ Π½Π° ΡΠΏΡΠΎΡΠ΅Π½ΠΈΠΈ ΠΎΠ³ΡΠ°Π½ΠΈΡΠ΅Π½ΠΈΡ. ΠΡ ΡΠ°ΠΊΠΆΠ΅ ΡΠ°ΡΡΠΌΠ°ΡΡΠΈΠ²Π°Π΅ΠΌ ΡΠΊΡΠΏΠ΅ΡΠΈΠΌΠ΅Π½ΡΠ°Π»ΡΠ½ΡΠ΅ ΡΠ΅Π·ΡΠ»ΡΡΠ°ΡΡ ΠΏΡΠΈΠΌΠ΅Π½Π΅Π½ΠΈΡ ΡΡΠΈΡ
ΡΠ΅Ρ
Π½ΠΎΠ»ΠΎΠ³ΠΈΠΈΜ ΠΎΠΏΡΠΈΠΌΠΈΠ·Π°ΡΠΈΠΈ ΠΈ ΠΈΡ
Π²Π»ΠΈΡΠ½ΠΈΡ Π½Π° ΡΠΊΠΎΡΠΎΡΡΡ ΠΎΠ±ΡΠ°Π±ΠΎΡΠΊΠΈ Π·Π°ΠΏΡΠΎΡΠ°. Π ΠΊΠΎΠ½ΡΠ΅ ΠΌΡ ΠΏΡΠ΅Π΄ΡΡΠ°Π²Π»ΡΠ΅ΠΌ ΠΎΠ±Π·ΠΎΡ Π²Π»ΠΈΡΠ½ΠΈΡ ΠΌΠΈΠ½ΠΈΠΌΠΈΠ·Π°ΡΠΈΠΈ Π·Π°ΠΏΡΠΎΡΠ° Π½Π° Π²Π΅ΡΡ ΠΏΡΠΎΡΠ΅ΡΡ ΠΎΠΏΡΠΈΠΌΠΈΠ·Π°ΡΠΈΠΈ Π·Π°ΠΏΡΠΎΡΠ°.
Analyzing Query Optimizer Performance in the Presence and Absence of Cardinality Estimates
Most query optimizers rely on cardinality estimates to determine optimal
execution plans. While traditional databases such as PostgreSQL, Oracle, and
Db2 utilize many types of synopses -- including histograms, samples, and
sketches -- recent main-memory databases like DuckDB and Heavy.AI often operate
with minimal or no estimates, yet their performance does not necessarily
suffer. To the best of our knowledge, no analytical comparison has been
conducted between optimizers with and without cardinality estimates to
understand their performance characteristics in different settings, such as
indexed, non-indexed, and multi-threaded. In this paper, we present a
comparative analysis between optimizers that use cardinality estimates and
those that do not. We use the Join Order Benchmark (JOB) for our evaluation and
true cardinalities as the baseline. Our investigation reveals that cardinality
estimates have marginal impact in non-indexed settings. Meanwhile, when indexes
are available, inaccurate estimates may lead to sub-optimal physical operators
-- even with an optimal join order. Furthermore, the impact of cardinality
estimates is less significant in highly-parallel main-memory databases
Dynamic Programming: The Next Step
Since 2013, dynamic programming (DP)-based plan generators are capable of
correctly reordering not only inner joins, but also outer joins. Now, we consider the
next big step: reordering not only joins, but also joins and grouping. Since only
reorderings of grouping with inner joins are known, we first develop equivalences
which allow reordering of grouping with outer joins. Then, we show how to extend a
state-of-the-art DP-based plan generator to fully explore these new plan alternatives