2,446 research outputs found
Policy iteration for perfect information stochastic mean payoff games with bounded first return times is strongly polynomial
Recent results of Ye and Hansen, Miltersen and Zwick show that policy
iteration for one or two player (perfect information) zero-sum stochastic
games, restricted to instances with a fixed discount rate, is strongly
polynomial. We show that policy iteration for mean-payoff zero-sum stochastic
games is also strongly polynomial when restricted to instances with bounded
first mean return time to a given state. The proof is based on methods of
nonlinear Perron-Frobenius theory, allowing us to reduce the mean-payoff
problem to a discounted problem with state dependent discount rate. Our
analysis also shows that policy iteration remains strongly polynomial for
discounted problems in which the discount rate can be state dependent (and even
negative) at certain states, provided that the spectral radii of the
nonnegative matrices associated to all strategies are bounded from above by a
fixed constant strictly less than 1.Comment: 17 page
The Stochastic Shortest Path Problem : A polyhedral combinatorics perspective
In this paper, we give a new framework for the stochastic shortest path
problem in finite state and action spaces. Our framework generalizes both the
frameworks proposed by Bertsekas and Tsitsikli and by Bertsekas and Yu. We
prove that the problem is well-defined and (weakly) polynomial when (i) there
is a way to reach the target state from any initial state and (ii) there is no
transition cycle of negative costs (a generalization of negative cost cycles).
These assumptions generalize the standard assumptions for the deterministic
shortest path problem and our framework encapsulates the latter problem (in
contrast with prior works). In this new setting, we can show that (a) one can
restrict to deterministic and stationary policies, (b) the problem is still
(weakly) polynomial through linear programming, (c) Value Iteration and Policy
Iteration converge, and (d) we can extend Dijkstra's algorithm
- …