Database query optimisation based on measures of regret
- Publication date
- Publisher
Abstract
The query optimiser in a database management system (DBMS) is responsible for
�nding a good order in which to execute the operators in a given query. However, in
practice the query optimiser does not usually guarantee to �nd the best plan. This is
often due to the non-availability of precise statistical data or inaccurate assumptions
made by the optimiser. In this thesis we propose a robust approach to logical query
optimisation that takes into account the unreliability in database statistics during
the optimisation process. In particular, we study the ordering problem for selection
operators and for join operators, where selectivities are modelled as intervals rather
than exact values. As a measure of optimality, we use a concept from decision theory
called minmax regret optimisation (MRO).
When using interval selectivities, the decision problem for selection operator ordering
turns out to be NP-hard. After investigating properties of the problem and
identifying special cases which can be solved in polynomial time, we develop a novel
heuristic for solving the general selection ordering problem in polynomial time. Experimental
evaluation of the heuristic using synthetic data, the Star Schema Benchmark
and real-world data sets shows that it outperforms other heuristics (which take
an optimistic, pessimistic or midpoint approach) and also produces plans whose regret
is on average very close to optimal.
The general join ordering problem is known to be NP-hard, even for exact selectivities.
So, for interval selectivities, we restrict our investigation to sets of join
operators which form a chain and to plans that correspond to left-deep join trees.
We investigate properties of the problem and use these, along with ideas from the
selection ordering heuristic and other algorithms in the literature, to develop a
polynomial-time heuristic tailored for the join ordering problem. Experimental evaluation
of the heuristic shows that, once again, it performs better than the optimistic,
pessimistic and midpoint heuristics. In addition, the results show that the heuristic
produces plans whose regret is on average even closer to the optimal than for
selection ordering