Search CORE

14 research outputs found

Policy iteration for perfect information stochastic mean payoff games with bounded first return times is strongly polynomial

Author: Akian Marianne
Gaubert Stéphane
Publication venue
Publication date: 18/10/2013
Field of study

Recent results of Ye and Hansen, Miltersen and Zwick show that policy iteration for one or two player (perfect information) zero-sum stochastic games, restricted to instances with a fixed discount rate, is strongly polynomial. We show that policy iteration for mean-payoff zero-sum stochastic games is also strongly polynomial when restricted to instances with bounded first mean return time to a given state. The proof is based on methods of nonlinear Perron-Frobenius theory, allowing us to reduce the mean-payoff problem to a discounted problem with state dependent discount rate. Our analysis also shows that policy iteration remains strongly polynomial for discounted problems in which the discount rate can be state dependent (and even negative) at certain states, provided that the spectral radii of the nonnegative matrices associated to all strategies are bounded from above by a fixed constant strictly less than 1.Comment: 17 page

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

HAL-Polytechnique

Policy iteration algorithm for zero-sum stochastic games with mean payoff

Author: Cochet-Terrasson Jean
Gaubert Stéphane
Publication venue: 'Elsevier BV'
Publication date: 01/01/2006
Field of study

We give a policy iteration algorithm to solve zero-sum stochastic games with finite state and action spaces and perfect information, when the value is defined in terms of the mean payoff per turn. This algorithm does not require any irreducibility assumption on the Markov chains determined by the strategies of the players. It is based on a discrete nonlinear analogue of the notion of reduction of a super-harmonic function

CiteSeerX

Comptes Rendus Mathématique

INRIA a CCSD electronic archive server

Numérisation de Documents Anciens Mathématiques

Multigrid methods for two-player zero-sum stochastic games

Author: Akian
Akian
Altman
Bank
Bardi
Bardi
Barles
Başar
Bellman
Bensoussan
Bensoussan
Berman
Bertsekas
Bokanowski
Bonnans
Brandt
Brandt
Cochet-Terrasson
Crandall
Davis
Denardo
Denardo
Elliott
Falgout
Fearnley
Filar
Fleming
Fleming
Fleming
Friedman
Friedmann
Hoffman
Hoppe
Hoppe
Howard
Kushner
Kushner
Lions
McEneaney
Mense
Munos
Neyman
Notay
Puterman
Puterman
Rockafellar
Ruge
Shapley
Shimkin
Sorin
Świpolhkech
Publication venue: 'Wiley'
Publication date: 22/11/2011
Field of study

We present a fast numerical algorithm for large scale zero-sum stochastic games with perfect information, which combines policy iteration and algebraic multigrid methods. This algorithm can be applied either to a true finite state space zero-sum two player game or to the discretization of an Isaacs equation. We present numerical tests on discretizations of Isaacs equations or variational inequalities. We also present a full multi-level policy iteration, similar to FMG, which allows to improve substantially the computation time for solving some variational inequalities.Comment: 31 page

arXiv.org e-Print Archive

Crossref

INRIA a CCSD electronic archive server

HAL-Polytechnique

Solving generic nonarchimedean semidefinite programs using stochastic game algorithms

Author: Blekherman G.
Butkovič P.
Filar J.
Megiddo N.
Scheiderer C.
Sturmfels B.
Publication venue: 'Elsevier BV'
Publication date: 19/07/2016
Field of study

A general issue in computational optimization is to develop combinatorial algorithms for semidefinite programming. We address this issue when the base field is nonarchimedean. We provide a solution for a class of semidefinite feasibility problems given by generic matrices. Our approach is based on tropical geometry. It relies on tropical spectrahedra, which are defined as the images by the valuation of nonarchimedean spectrahedra. We establish a correspondence between generic tropical spectrahedra and zero-sum stochastic games with perfect information. The latter have been well studied in algorithmic game theory. This allows us to solve nonarchimedean semidefinite feasibility problems using algorithms for stochastic games. These algorithms are of a combinatorial nature and work for large instances.Comment: v1: 25 pages, 4 figures; v2: 27 pages, 4 figures, minor revisions + benchmarks added; v3: 30 pages, 6 figures, generalization to non-Metzler sign patterns + some results have been replaced by references to the companion work arXiv:1610.0674

arXiv.org e-Print Archive

Crossref

INRIA a CCSD electronic archive server

HAL-Polytechnique

A policy iteration algorithm for zero-sum stochastic games with mean payoff

Author: Jean Cochet-Terrasson
Stéphane Gaubert
Publication venue: 'Elsevier BV'
Publication date: 01/01/2006
Field of study

Comptes Rendus Mathématique

Stochastic Shortest Path with Energy Constraints in POMDPs

Author: Brázdil Tomáš
Chatterjee Krishnendu
Chmelík Martin
Gupta Anchit
Novotný Petr
Publication venue
Publication date: 01/01/2016
Field of study

We consider partially observable Markov decision processes (POMDPs) with a set of target states and positive integer costs associated with every transition. The traditional optimization objective (stochastic shortest path) asks to minimize the expected total cost until the target set is reached. We extend the traditional framework of POMDPs to model energy consumption, which represents a hard constraint. The energy levels may increase and decrease with transitions, and the hard constraint requires that the energy level must remain positive in all steps till the target is reached. First, we present a novel algorithm for solving POMDPs with energy levels, developing on existing POMDP solvers and using RTDP as its main method. Our second contribution is related to policy representation. For larger POMDP instances the policies computed by existing solvers are too large to be understandable. We present an automated procedure based on machine learning techniques that automatically extracts important decisions of the policy allowing us to compute succinct human readable policies. Finally, we show experimentally that our algorithm performs well and computes succinct policies on a number of POMDP instances from the literature that were naturally enhanced with energy levels.Comment: Technical report accompanying a paper published in proceedings of AAMAS 201

arXiv.org e-Print Archive

IST Austria: PubRep (Institute of Science and Technology)

Multiagent learning in the presence of memory-bounded agents

Author: AM Sykulski
B Banerjee
C Claus
CJCH Watkins
D Chakraborty
D Fudenberg
D Fudenberg
DH Wolpert
Doran Chakraborty
DP Foster
F Southey
G Brown
H Dyke Parunak Van
J Hannan
JF Nash Jr
K Tuyls
M Bowling
MJ Osborne
MJ Wooldridge
ML Littman
ML Littman
ML Littman
ML Puterman
P Stone
P Stone
Peter Stone
R Aumann
R Powers
RI Brafman
RS Sutton
S Airiau
S Hart
S Singh
V Conitzer
Y Chevaleyre
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref