2,618 research outputs found
Strategy iteration algorithms for games and Markov decision processes
In this thesis, we consider the problem of solving two player infinite games,
such as parity games, mean-payoff games, and discounted games, the problem of
solving Markov decision processes. We study a specific type of algorithm for solving
these problems that we call strategy iteration algorithms. Strategy improvement
algorithms are an example of a type of algorithm that falls under this classification.
We also study Lemkeās algorithm and the Cottle-Dantzig algorithm, which
are classical pivoting algorithms for solving the linear complementarity problem.
The reduction of Jurdzinski and Savani from discounted games to LCPs allows these
algorithms to be applied to infinite games [JS08]. We show that, when they are
applied to games, these algorithms can be viewed as strategy iteration algorithms.
We also resolve the question of their running time on these games by providing a
family of examples upon which these algorithm take exponential time.
Greedy strategy improvement is a natural variation of strategy improvement,
and Friedmann has recently shown an exponential lower bound for this algorithm
when it is applied to infinite games [Fri09]. However, these lower bounds do not
apply for Markov decision processes. We extend Friedmannās work in order to prove
an exponential lower bound for greedy strategy improvement in the MDP setting.
We also study variations on strategy improvement for infinite games. We
show that there are structures in these games that current strategy improvement
algorithms do not take advantage of. We also show that lower bounds given by
Friedmann [Fri09], and those that are based on his work [FHZ10], work because they
exploit this ignorance. We use our insight to design strategy improvement algorithms
that avoid poor performance caused by the structures that these examples use
Strategy iteration algorithms for games and Markov decision processes
In this thesis, we consider the problem of solving two player infinite games, such as parity games, mean-payoff games, and discounted games, the problem of solving Markov decision processes. We study a specific type of algorithm for solving these problems that we call strategy iteration algorithms. Strategy improvement algorithms are an example of a type of algorithm that falls under this classification. We also study Lemkeās algorithm and the Cottle-Dantzig algorithm, which are classical pivoting algorithms for solving the linear complementarity problem. The reduction of Jurdzinski and Savani from discounted games to LCPs allows these algorithms to be applied to infinite games [JS08]. We show that, when they are applied to games, these algorithms can be viewed as strategy iteration algorithms. We also resolve the question of their running time on these games by providing a family of examples upon which these algorithm take exponential time. Greedy strategy improvement is a natural variation of strategy improvement, and Friedmann has recently shown an exponential lower bound for this algorithm when it is applied to infinite games [Fri09]. However, these lower bounds do not apply for Markov decision processes. We extend Friedmannās work in order to prove an exponential lower bound for greedy strategy improvement in the MDP setting. We also study variations on strategy improvement for infinite games. We show that there are structures in these games that current strategy improvement algorithms do not take advantage of. We also show that lower bounds given by Friedmann [Fri09], and those that are based on his work [FHZ10], work because they exploit this ignorance. We use our insight to design strategy improvement algorithms that avoid poor performance caused by the structures that these examples use.EThOS - Electronic Theses Online ServiceGBUnited Kingdo
Strategy iteration algorithms for games and Markov decision processes
In this thesis, we consider the problem of solving two player infinite games, such as parity games, mean-payoff games, and discounted games, the problem of solving Markov decision processes. We study a specific type of algorithm for solving these problems that we call strategy iteration algorithms. Strategy improvement algorithms are an example of a type of algorithm that falls under this classification. We also study Lemkeās algorithm and the Cottle-Dantzig algorithm, which are classical pivoting algorithms for solving the linear complementarity problem. The reduction of Jurdzinski and Savani from discounted games to LCPs allows these algorithms to be applied to infinite games [JS08]. We show that, when they are applied to games, these algorithms can be viewed as strategy iteration algorithms. We also resolve the question of their running time on these games by providing a family of examples upon which these algorithm take exponential time. Greedy strategy improvement is a natural variation of strategy improvement, and Friedmann has recently shown an exponential lower bound for this algorithm when it is applied to infinite games [Fri09]. However, these lower bounds do not apply for Markov decision processes. We extend Friedmannās work in order to prove an exponential lower bound for greedy strategy improvement in the MDP setting. We also study variations on strategy improvement for infinite games. We show that there are structures in these games that current strategy improvement algorithms do not take advantage of. We also show that lower bounds given by Friedmann [Fri09], and those that are based on his work [FHZ10], work because they exploit this ignorance. We use our insight to design strategy improvement algorithms that avoid poor performance caused by the structures that these examples use.EThOS - Electronic Theses Online ServiceGBUnited Kingdo
Deflation for semismooth equations
Variational inequalities can in general support distinct solutions. In this
paper we study an algorithm for computing distinct solutions of a variational
inequality, without varying the initial guess supplied to the solver. The
central idea is the combination of a semismooth Newton method with a deflation
operator that eliminates known solutions from consideration. Given one root of
a semismooth residual, deflation constructs a new problem for which a
semismooth Newton method will not converge to the known root, even from the
same initial guess. This enables the discovery of other roots. We prove the
effectiveness of the deflation technique under the same assumptions that
guarantee locally superlinear convergence of a semismooth Newton method. We
demonstrate its utility on various finite- and infinite-dimensional examples
drawn from constrained optimization, game theory, economics and solid
mechanics.Comment: 24 pages, 3 figure
- ā¦