14 research outputs found

    Policy iteration for perfect information stochastic mean payoff games with bounded first return times is strongly polynomial

    Full text link
    Recent results of Ye and Hansen, Miltersen and Zwick show that policy iteration for one or two player (perfect information) zero-sum stochastic games, restricted to instances with a fixed discount rate, is strongly polynomial. We show that policy iteration for mean-payoff zero-sum stochastic games is also strongly polynomial when restricted to instances with bounded first mean return time to a given state. The proof is based on methods of nonlinear Perron-Frobenius theory, allowing us to reduce the mean-payoff problem to a discounted problem with state dependent discount rate. Our analysis also shows that policy iteration remains strongly polynomial for discounted problems in which the discount rate can be state dependent (and even negative) at certain states, provided that the spectral radii of the nonnegative matrices associated to all strategies are bounded from above by a fixed constant strictly less than 1.Comment: 17 page

    Policy iteration algorithm for zero-sum stochastic games with mean payoff

    Get PDF
    We give a policy iteration algorithm to solve zero-sum stochastic games with finite state and action spaces and perfect information, when the value is defined in terms of the mean payoff per turn. This algorithm does not require any irreducibility assumption on the Markov chains determined by the strategies of the players. It is based on a discrete nonlinear analogue of the notion of reduction of a super-harmonic function

    Multigrid methods for two-player zero-sum stochastic games

    Full text link
    We present a fast numerical algorithm for large scale zero-sum stochastic games with perfect information, which combines policy iteration and algebraic multigrid methods. This algorithm can be applied either to a true finite state space zero-sum two player game or to the discretization of an Isaacs equation. We present numerical tests on discretizations of Isaacs equations or variational inequalities. We also present a full multi-level policy iteration, similar to FMG, which allows to improve substantially the computation time for solving some variational inequalities.Comment: 31 page

    Solving generic nonarchimedean semidefinite programs using stochastic game algorithms

    Full text link
    A general issue in computational optimization is to develop combinatorial algorithms for semidefinite programming. We address this issue when the base field is nonarchimedean. We provide a solution for a class of semidefinite feasibility problems given by generic matrices. Our approach is based on tropical geometry. It relies on tropical spectrahedra, which are defined as the images by the valuation of nonarchimedean spectrahedra. We establish a correspondence between generic tropical spectrahedra and zero-sum stochastic games with perfect information. The latter have been well studied in algorithmic game theory. This allows us to solve nonarchimedean semidefinite feasibility problems using algorithms for stochastic games. These algorithms are of a combinatorial nature and work for large instances.Comment: v1: 25 pages, 4 figures; v2: 27 pages, 4 figures, minor revisions + benchmarks added; v3: 30 pages, 6 figures, generalization to non-Metzler sign patterns + some results have been replaced by references to the companion work arXiv:1610.0674

    Stochastic Shortest Path with Energy Constraints in POMDPs

    Full text link
    We consider partially observable Markov decision processes (POMDPs) with a set of target states and positive integer costs associated with every transition. The traditional optimization objective (stochastic shortest path) asks to minimize the expected total cost until the target set is reached. We extend the traditional framework of POMDPs to model energy consumption, which represents a hard constraint. The energy levels may increase and decrease with transitions, and the hard constraint requires that the energy level must remain positive in all steps till the target is reached. First, we present a novel algorithm for solving POMDPs with energy levels, developing on existing POMDP solvers and using RTDP as its main method. Our second contribution is related to policy representation. For larger POMDP instances the policies computed by existing solvers are too large to be understandable. We present an automated procedure based on machine learning techniques that automatically extracts important decisions of the policy allowing us to compute succinct human readable policies. Finally, we show experimentally that our algorithm performs well and computes succinct policies on a number of POMDP instances from the literature that were naturally enhanced with energy levels.Comment: Technical report accompanying a paper published in proceedings of AAMAS 201
    corecore