1,982 research outputs found
An Exponential Lower Bound for the Latest Deterministic Strategy Iteration Algorithms
This paper presents a new exponential lower bound for the two most popular
deterministic variants of the strategy improvement algorithms for solving
parity, mean payoff, discounted payoff and simple stochastic games. The first
variant improves every node in each step maximizing the current valuation
locally, whereas the second variant computes the globally optimal improvement
in each step. We outline families of games on which both variants require
exponentially many strategy iterations
Multigrid methods for two-player zero-sum stochastic games
We present a fast numerical algorithm for large scale zero-sum stochastic
games with perfect information, which combines policy iteration and algebraic
multigrid methods. This algorithm can be applied either to a true finite state
space zero-sum two player game or to the discretization of an Isaacs equation.
We present numerical tests on discretizations of Isaacs equations or
variational inequalities. We also present a full multi-level policy iteration,
similar to FMG, which allows to improve substantially the computation time for
solving some variational inequalities.Comment: 31 page
Value Iteration for Long-run Average Reward in Markov Decision Processes
Markov decision processes (MDPs) are standard models for probabilistic
systems with non-deterministic behaviours. Long-run average rewards provide a
mathematically elegant formalism for expressing long term performance. Value
iteration (VI) is one of the simplest and most efficient algorithmic approaches
to MDPs with other properties, such as reachability objectives. Unfortunately,
a naive extension of VI does not work for MDPs with long-run average rewards,
as there is no known stopping criterion. In this work our contributions are
threefold. (1) We refute a conjecture related to stopping criteria for MDPs
with long-run average rewards. (2) We present two practical algorithms for MDPs
with long-run average rewards based on VI. First, we show that a combination of
applying VI locally for each maximal end-component (MEC) and VI for
reachability objectives can provide approximation guarantees. Second, extending
the above approach with a simulation-guided on-demand variant of VI, we present
an anytime algorithm that is able to deal with very large models. (3) Finally,
we present experimental results showing that our methods significantly
outperform the standard approaches on several benchmarks
Using Strategy Improvement to Stay Alive
We design a novel algorithm for solving Mean-Payoff Games (MPGs). Besides
solving an MPG in the usual sense, our algorithm computes more information
about the game, information that is important with respect to applications. The
weights of the edges of an MPG can be thought of as a gained/consumed energy --
depending on the sign. For each vertex, our algorithm computes the minimum
amount of initial energy that is sufficient for player Max to ensure that in a
play starting from the vertex, the energy level never goes below zero. Our
algorithm is not the first algorithm that computes the minimum sufficient
initial energies, but according to our experimental study it is the fastest
algorithm that computes them. The reason is that it utilizes the strategy
improvement technique which is very efficient in practice
The level set method for the two-sided eigenproblem
We consider the max-plus analogue of the eigenproblem for matrix pencils
Ax=lambda Bx. We show that the spectrum of (A,B) (i.e., the set of possible
values of lambda), which is a finite union of intervals, can be computed in
pseudo-polynomial number of operations, by a (pseudo-polynomial) number of
calls to an oracle that computes the value of a mean payoff game. The proof
relies on the introduction of a spectral function, which we interpret in terms
of the least Chebyshev distance between Ax and lambda Bx. The spectrum is
obtained as the zero level set of this function.Comment: 34 pages, 4 figures. Changes with respect to the previous version: we
explain relation to mean-payoff games and discrete event systems, and show
that the reconstruction of spectrum is pseudopolynomia
Tropical polyhedra are equivalent to mean payoff games
We show that several decision problems originating from max-plus or tropical
convexity are equivalent to zero-sum two player game problems. In particular,
we set up an equivalence between the external representation of tropical convex
sets and zero-sum stochastic games, in which tropical polyhedra correspond to
deterministic games with finite action spaces. Then, we show that the winning
initial positions can be determined from the associated tropical polyhedron. We
obtain as a corollary a game theoretical proof of the fact that the tropical
rank of a matrix, defined as the maximal size of a submatrix for which the
optimal assignment problem has a unique solution, coincides with the maximal
number of rows (or columns) of the matrix which are linearly independent in the
tropical sense. Our proofs rely on techniques from non-linear Perron-Frobenius
theory.Comment: 28 pages, 5 figures; v2: updated references, added background
materials and illustrations; v3: minor improvements, references update
Faster Algorithm for Mean-Payoff Games
We study some existing techniques for solving mean-payoff games (MPGs),
improve them, and design a randomized algorithm for solving MPGs with
currently the best expected complexity
Tropical Fourier-Motzkin elimination, with an application to real-time verification
We introduce a generalization of tropical polyhedra able to express both
strict and non-strict inequalities. Such inequalities are handled by means of a
semiring of germs (encoding infinitesimal perturbations). We develop a tropical
analogue of Fourier-Motzkin elimination from which we derive geometrical
properties of these polyhedra. In particular, we show that they coincide with
the tropically convex union of (non-necessarily closed) cells that are convex
both classically and tropically. We also prove that the redundant inequalities
produced when performing successive elimination steps can be dynamically
deleted by reduction to mean payoff game problems. As a complement, we provide
a coarser (polynomial time) deletion procedure which is enough to arrive at a
simply exponential bound for the total execution time. These algorithms are
illustrated by an application to real-time systems (reachability analysis of
timed automata).Comment: 29 pages, 8 figure
Optimal market making under partial information and numerical methods for impulse control games with applications
The topics treated in this thesis are inherently two-fold. The first part considers the problem of a market maker who wants to optimally set bid/ask quotes over a finite time horizon, to maximize her expected utility. The intensities of the orders she receives depend not only on the spreads she quotes, but also on unobservable factors modelled by a hidden Markov chain. This stochastic control problem under partial information is solved by means of stochastic filtering, control and piecewise-deterministic Markov processes theory. The value function is characterized as the unique continuous viscosity solution of its dynamic programming equation. Afterwards, the analogous full information problem is solved and results are compared numerically through a concrete example. The optimal full information spreads are shown to be biased when the exact market regime is unknown, as the market maker needs to adjust for additional regime uncertainty in terms of P&L sensitivity and observable order ow volatility.
The second part deals with numerically solving nonzero-sum stochastic differential games with impulse controls. These offer a realistic and far-reaching modelling framework for applications within finance, energy markets and other areas, but the diffculty in solving such problems has hindered their proliferation. Semi-analytical approaches make strong assumptions pertaining very particular cases. To the author's best knowledge, there are no numerical methods available in the literature. A policy-iteration-type solver is proposed to solve an underlying system of quasi-variational inequalities, and it is validated numerically with reassuring results. In particular, it is observed that the algorithm does not enjoy global convergence and a heuristic methodology is proposed to construct initial guesses.
Eventually, the focus is put on games with a symmetric structure and a substantially improved version of the former algorithm is put forward. A rigorous convergence analysis is undertaken with natural assumptions on the players strategies, which admit graph-theoretic interpretations in the context of weakly chained diagonally dominant matrices. A provably convergent single-player impulse control solver, often outperforming classical policy iteration, is also provided. The main algorithm is used to compute with high precision equilibrium payoffs and Nash equilibria of otherwise too challenging problems, and even some for which results go beyond the scope of all the currently available theory
- …