857 research outputs found
Database query optimisation based on measures of regret
The query optimiser in a database management system (DBMS) is responsible for
�nding a good order in which to execute the operators in a given query. However, in
practice the query optimiser does not usually guarantee to �nd the best plan. This is
often due to the non-availability of precise statistical data or inaccurate assumptions
made by the optimiser. In this thesis we propose a robust approach to logical query
optimisation that takes into account the unreliability in database statistics during
the optimisation process. In particular, we study the ordering problem for selection
operators and for join operators, where selectivities are modelled as intervals rather
than exact values. As a measure of optimality, we use a concept from decision theory
called minmax regret optimisation (MRO).
When using interval selectivities, the decision problem for selection operator ordering
turns out to be NP-hard. After investigating properties of the problem and
identifying special cases which can be solved in polynomial time, we develop a novel
heuristic for solving the general selection ordering problem in polynomial time. Experimental
evaluation of the heuristic using synthetic data, the Star Schema Benchmark
and real-world data sets shows that it outperforms other heuristics (which take
an optimistic, pessimistic or midpoint approach) and also produces plans whose regret
is on average very close to optimal.
The general join ordering problem is known to be NP-hard, even for exact selectivities.
So, for interval selectivities, we restrict our investigation to sets of join
operators which form a chain and to plans that correspond to left-deep join trees.
We investigate properties of the problem and use these, along with ideas from the
selection ordering heuristic and other algorithms in the literature, to develop a
polynomial-time heuristic tailored for the join ordering problem. Experimental evaluation
of the heuristic shows that, once again, it performs better than the optimistic,
pessimistic and midpoint heuristics. In addition, the results show that the heuristic
produces plans whose regret is on average even closer to the optimal than for
selection ordering
Database query optimisation based on measures of regret
The query optimiser in a database management system (DBMS) is responsible for
�nding a good order in which to execute the operators in a given query. However, in
practice the query optimiser does not usually guarantee to �nd the best plan. This is
often due to the non-availability of precise statistical data or inaccurate assumptions
made by the optimiser. In this thesis we propose a robust approach to logical query
optimisation that takes into account the unreliability in database statistics during
the optimisation process. In particular, we study the ordering problem for selection
operators and for join operators, where selectivities are modelled as intervals rather
than exact values. As a measure of optimality, we use a concept from decision theory
called minmax regret optimisation (MRO).
When using interval selectivities, the decision problem for selection operator ordering
turns out to be NP-hard. After investigating properties of the problem and
identifying special cases which can be solved in polynomial time, we develop a novel
heuristic for solving the general selection ordering problem in polynomial time. Experimental
evaluation of the heuristic using synthetic data, the Star Schema Benchmark
and real-world data sets shows that it outperforms other heuristics (which take
an optimistic, pessimistic or midpoint approach) and also produces plans whose regret
is on average very close to optimal.
The general join ordering problem is known to be NP-hard, even for exact selectivities.
So, for interval selectivities, we restrict our investigation to sets of join
operators which form a chain and to plans that correspond to left-deep join trees.
We investigate properties of the problem and use these, along with ideas from the
selection ordering heuristic and other algorithms in the literature, to develop a
polynomial-time heuristic tailored for the join ordering problem. Experimental evaluation
of the heuristic shows that, once again, it performs better than the optimistic,
pessimistic and midpoint heuristics. In addition, the results show that the heuristic
produces plans whose regret is on average even closer to the optimal than for
selection ordering
Ordering selection operators under partial ignorance
Optimising queries in real-world situations under imperfect conditions is still a problem that has not been fully solved. We consider finding the optimal order in which to execute a given set of selection operators under partial ignorance of their selectivities. The selectivities are modelled as intervals rather than exact values and we apply a concept from decision theory, the minimisation of the maximum regret, as a measure of optimality. The associated decision problem turns out to be NP-hard, which renders a brute-force approach to solving it impractical. Nevertheless, by investigating properties of the problem and identifying special cases which can be solved in polynomial time, we gain insight that we use to develop a novel heuristic for solving the general problem. We also evaluate minmax regret query optimisation experimentally, showing that it outperforms a currently employed strategy of optimisers that uses mean values for uncertain parameters
Differential Privacy, Property Testing, and Perturbations
Controlling the dissemination of information about ourselves has become a minefield in
the modern age. We release data about ourselves every day and don’t always fully understand
what information is contained in this data. It is often the case that the combination
of seemingly innocuous pieces of data can be combined to reveal more sensitive information
about ourselves than we intended. Differential privacy has developed as a technique
to prevent this type of privacy leakage. It borrows ideas from information theory to inject
enough uncertainty into the data so that sensitive information is provably absent from
the privatised data. Current research in differential privacy walks the fine line between
removing sensitive information while allowing non-sensitive information to be released.
At its heart, this thesis is about the study of information. Many of the results can be
formulated as asking a subset of the questions: does the data you have contain enough
information to learn what you would like to learn? and how can I affect the data to ensure
you can’t discern sensitive information? We will often approach the former question from
both directions: information theoretic lower bounds on recovery and algorithmic upper
bounds.
We begin with an information theoretic lower bound for graphon estimation. This explores
the fundamental limits of how much information about the underlying population is
contained in a finite sample of data. We then move on to exploring the connection between
information theoretic results and privacy in the context of linear inverse problems. We find
that there is a discrepancy between how the inverse problems community and the privacy
community view good recovery of information. Next, we explore black-box testing for
privacy. We argue that the amount of information required to verify the privacy guarantee
of an algorithm, without access to the internals of the algorithm, is lower bounded by the
amount of information required to break the privacy guarantee. Finally, we explore a setting
where imposing privacy is a help rather than a hindrance: online linear optimisation.
We argue that private algorithms have the right kind of stability guarantee to ensure low
regret for online linear optimisation.PHDMathematicsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/143940/1/amcm_1.pd
Ranking algorithms for implicit feedback
This report presents novel algorithms to use eye movements as an implicit relevance feedback in order to improve the performance of the searches. The algorithms are evaluated on "Transport Rank Five" Dataset which were previously collected in Task 8.3. We demonstrated that simple linear combination or tensor product of eye movement and image features can improve the retrieval accuracy
Addressing practical challenges of Bayesian optimisation
This thesis focuses on addressing several challenges in applying Bayesian optimisation in real world problems. The contributions of this thesis are new Bayesian optimisation algorithms for three practical problems: finding stable solutions, optimising cascaded processes and privacy-aware optimisation
- …