1,982 research outputs found

    An Exponential Lower Bound for the Latest Deterministic Strategy Iteration Algorithms

    Full text link
    This paper presents a new exponential lower bound for the two most popular deterministic variants of the strategy improvement algorithms for solving parity, mean payoff, discounted payoff and simple stochastic games. The first variant improves every node in each step maximizing the current valuation locally, whereas the second variant computes the globally optimal improvement in each step. We outline families of games on which both variants require exponentially many strategy iterations

    Multigrid methods for two-player zero-sum stochastic games

    Full text link
    We present a fast numerical algorithm for large scale zero-sum stochastic games with perfect information, which combines policy iteration and algebraic multigrid methods. This algorithm can be applied either to a true finite state space zero-sum two player game or to the discretization of an Isaacs equation. We present numerical tests on discretizations of Isaacs equations or variational inequalities. We also present a full multi-level policy iteration, similar to FMG, which allows to improve substantially the computation time for solving some variational inequalities.Comment: 31 page

    Value Iteration for Long-run Average Reward in Markov Decision Processes

    Full text link
    Markov decision processes (MDPs) are standard models for probabilistic systems with non-deterministic behaviours. Long-run average rewards provide a mathematically elegant formalism for expressing long term performance. Value iteration (VI) is one of the simplest and most efficient algorithmic approaches to MDPs with other properties, such as reachability objectives. Unfortunately, a naive extension of VI does not work for MDPs with long-run average rewards, as there is no known stopping criterion. In this work our contributions are threefold. (1) We refute a conjecture related to stopping criteria for MDPs with long-run average rewards. (2) We present two practical algorithms for MDPs with long-run average rewards based on VI. First, we show that a combination of applying VI locally for each maximal end-component (MEC) and VI for reachability objectives can provide approximation guarantees. Second, extending the above approach with a simulation-guided on-demand variant of VI, we present an anytime algorithm that is able to deal with very large models. (3) Finally, we present experimental results showing that our methods significantly outperform the standard approaches on several benchmarks

    Using Strategy Improvement to Stay Alive

    Full text link
    We design a novel algorithm for solving Mean-Payoff Games (MPGs). Besides solving an MPG in the usual sense, our algorithm computes more information about the game, information that is important with respect to applications. The weights of the edges of an MPG can be thought of as a gained/consumed energy -- depending on the sign. For each vertex, our algorithm computes the minimum amount of initial energy that is sufficient for player Max to ensure that in a play starting from the vertex, the energy level never goes below zero. Our algorithm is not the first algorithm that computes the minimum sufficient initial energies, but according to our experimental study it is the fastest algorithm that computes them. The reason is that it utilizes the strategy improvement technique which is very efficient in practice

    The level set method for the two-sided eigenproblem

    Full text link
    We consider the max-plus analogue of the eigenproblem for matrix pencils Ax=lambda Bx. We show that the spectrum of (A,B) (i.e., the set of possible values of lambda), which is a finite union of intervals, can be computed in pseudo-polynomial number of operations, by a (pseudo-polynomial) number of calls to an oracle that computes the value of a mean payoff game. The proof relies on the introduction of a spectral function, which we interpret in terms of the least Chebyshev distance between Ax and lambda Bx. The spectrum is obtained as the zero level set of this function.Comment: 34 pages, 4 figures. Changes with respect to the previous version: we explain relation to mean-payoff games and discrete event systems, and show that the reconstruction of spectrum is pseudopolynomia

    Tropical polyhedra are equivalent to mean payoff games

    Full text link
    We show that several decision problems originating from max-plus or tropical convexity are equivalent to zero-sum two player game problems. In particular, we set up an equivalence between the external representation of tropical convex sets and zero-sum stochastic games, in which tropical polyhedra correspond to deterministic games with finite action spaces. Then, we show that the winning initial positions can be determined from the associated tropical polyhedron. We obtain as a corollary a game theoretical proof of the fact that the tropical rank of a matrix, defined as the maximal size of a submatrix for which the optimal assignment problem has a unique solution, coincides with the maximal number of rows (or columns) of the matrix which are linearly independent in the tropical sense. Our proofs rely on techniques from non-linear Perron-Frobenius theory.Comment: 28 pages, 5 figures; v2: updated references, added background materials and illustrations; v3: minor improvements, references update

    Faster Algorithm for Mean-Payoff Games

    Get PDF
    We study some existing techniques for solving mean-payoff games (MPGs), improve them, and design a randomized algorithm for solving MPGs with currently the best expected complexity

    Tropical Fourier-Motzkin elimination, with an application to real-time verification

    Get PDF
    We introduce a generalization of tropical polyhedra able to express both strict and non-strict inequalities. Such inequalities are handled by means of a semiring of germs (encoding infinitesimal perturbations). We develop a tropical analogue of Fourier-Motzkin elimination from which we derive geometrical properties of these polyhedra. In particular, we show that they coincide with the tropically convex union of (non-necessarily closed) cells that are convex both classically and tropically. We also prove that the redundant inequalities produced when performing successive elimination steps can be dynamically deleted by reduction to mean payoff game problems. As a complement, we provide a coarser (polynomial time) deletion procedure which is enough to arrive at a simply exponential bound for the total execution time. These algorithms are illustrated by an application to real-time systems (reachability analysis of timed automata).Comment: 29 pages, 8 figure

    Optimal market making under partial information and numerical methods for impulse control games with applications

    Get PDF
    The topics treated in this thesis are inherently two-fold. The first part considers the problem of a market maker who wants to optimally set bid/ask quotes over a finite time horizon, to maximize her expected utility. The intensities of the orders she receives depend not only on the spreads she quotes, but also on unobservable factors modelled by a hidden Markov chain. This stochastic control problem under partial information is solved by means of stochastic filtering, control and piecewise-deterministic Markov processes theory. The value function is characterized as the unique continuous viscosity solution of its dynamic programming equation. Afterwards, the analogous full information problem is solved and results are compared numerically through a concrete example. The optimal full information spreads are shown to be biased when the exact market regime is unknown, as the market maker needs to adjust for additional regime uncertainty in terms of P&L sensitivity and observable order ow volatility. The second part deals with numerically solving nonzero-sum stochastic differential games with impulse controls. These offer a realistic and far-reaching modelling framework for applications within finance, energy markets and other areas, but the diffculty in solving such problems has hindered their proliferation. Semi-analytical approaches make strong assumptions pertaining very particular cases. To the author's best knowledge, there are no numerical methods available in the literature. A policy-iteration-type solver is proposed to solve an underlying system of quasi-variational inequalities, and it is validated numerically with reassuring results. In particular, it is observed that the algorithm does not enjoy global convergence and a heuristic methodology is proposed to construct initial guesses. Eventually, the focus is put on games with a symmetric structure and a substantially improved version of the former algorithm is put forward. A rigorous convergence analysis is undertaken with natural assumptions on the players strategies, which admit graph-theoretic interpretations in the context of weakly chained diagonally dominant matrices. A provably convergent single-player impulse control solver, often outperforming classical policy iteration, is also provided. The main algorithm is used to compute with high precision equilibrium payoffs and Nash equilibria of otherwise too challenging problems, and even some for which results go beyond the scope of all the currently available theory
    • …
    corecore