765 research outputs found
Policy iteration for perfect information stochastic mean payoff games with bounded first return times is strongly polynomial
Recent results of Ye and Hansen, Miltersen and Zwick show that policy
iteration for one or two player (perfect information) zero-sum stochastic
games, restricted to instances with a fixed discount rate, is strongly
polynomial. We show that policy iteration for mean-payoff zero-sum stochastic
games is also strongly polynomial when restricted to instances with bounded
first mean return time to a given state. The proof is based on methods of
nonlinear Perron-Frobenius theory, allowing us to reduce the mean-payoff
problem to a discounted problem with state dependent discount rate. Our
analysis also shows that policy iteration remains strongly polynomial for
discounted problems in which the discount rate can be state dependent (and even
negative) at certain states, provided that the spectral radii of the
nonnegative matrices associated to all strategies are bounded from above by a
fixed constant strictly less than 1.Comment: 17 page
The Complexity of Infinite-Horizon General-Sum Stochastic Games
We study the complexity of computing stationary Nash equilibrium (NE) in n-player infinite-horizon general-sum stochastic games. We focus on the problem of computing NE in such stochastic games when each player is restricted to choosing a stationary policy and rewards are discounted. First, we prove that computing such NE is in PPAD (in addition to clearly being PPAD-hard). Second, we consider turn-based specializations of such games where at each state there is at most a single player that can take actions and show that these (seemingly-simpler) games remain PPAD-hard. Third, we show that under further structural assumptions on the rewards computing NE in such turn-based games is possible in polynomial time. Towards achieving these results we establish structural facts about stochastic games of broader utility, including monotonicity of utilities under single-state single-action changes and reductions to settings where each player controls a single state
Smoothed analysis of deterministic discounted and mean-payoff games
We devise a policy-iteration algorithm for deterministic two-player
discounted and mean-payoff games, that runs in polynomial time with high
probability, on any input where each payoff is chosen independently from a
sufficiently random distribution.
This includes the case where an arbitrary set of payoffs has been perturbed
by a Gaussian, showing for the first time that deterministic two-player games
can be solved efficiently, in the sense of smoothed analysis.
More generally, we devise a condition number for deterministic discounted and
mean-payoff games, and show that our algorithm runs in time polynomial in this
condition number.
Our result confirms a previous conjecture of Boros et al., which was claimed
as a theorem and later retracted. It stands in contrast with a recent
counter-example by Christ and Yannakakis, showing that Howard's
policy-iteration algorithm does not run in smoothed polynomial time on
stochastic single-player mean-payoff games.
Our approach is inspired by the analysis of random optimal assignment
instances by Frieze and Sorkin, and the analysis of bias-induced policies for
mean-payoff games by Akian, Gaubert and Hochart
Termination Criteria for Solving Concurrent Safety and Reachability Games
We consider concurrent games played on graphs. At every round of a game, each
player simultaneously and independently selects a move; the moves jointly
determine the transition to a successor state. Two basic objectives are the
safety objective to stay forever in a given set of states, and its dual, the
reachability objective to reach a given set of states. We present in this paper
a strategy improvement algorithm for computing the value of a concurrent safety
game, that is, the maximal probability with which player~1 can enforce the
safety objective. The algorithm yields a sequence of player-1 strategies which
ensure probabilities of winning that converge monotonically to the value of the
safety game.
Our result is significant because the strategy improvement algorithm
provides, for the first time, a way to approximate the value of a concurrent
safety game from below. Since a value iteration algorithm, or a strategy
improvement algorithm for reachability games, can be used to approximate the
same value from above, the combination of both algorithms yields a method for
computing a converging sequence of upper and lower bounds for the values of
concurrent reachability and safety games. Previous methods could approximate
the values of these games only from one direction, and as no rates of
convergence are known, they did not provide a practical way to solve these
games
The Complexity of Infinite-Horizon General-Sum Stochastic Games
We study the complexity of computing stationary Nash equilibrium (NE) in
n-player infinite-horizon general-sum stochastic games. We focus on the problem
of computing NE in such stochastic games when each player is restricted to
choosing a stationary policy and rewards are discounted. First, we prove that
computing such NE is in PPAD (in addition to clearly being PPAD-hard). Second,
we consider turn-based specializations of such games where at each state there
is at most a single player that can take actions and show that these
(seemingly-simpler) games remain PPAD-hard. Third, we show that under further
structural assumptions on the rewards computing NE in such turn-based games is
possible in polynomial time. Towards achieving these results we establish
structural facts about stochastic games of broader utility, including
monotonicity of utilities under single-state single-action changes and
reductions to settings where each player controls a single state
- …