94 research outputs found
Qualitative Analysis of Concurrent Mean-payoff Games
We consider concurrent games played by two-players on a finite-state graph,
where in every round the players simultaneously choose a move, and the current
state along with the joint moves determine the successor state. We study a
fundamental objective, namely, mean-payoff objective, where a reward is
associated to each transition, and the goal of player 1 is to maximize the
long-run average of the rewards, and the objective of player 2 is strictly the
opposite. The path constraint for player 1 could be qualitative, i.e., the
mean-payoff is the maximal reward, or arbitrarily close to it; or quantitative,
i.e., a given threshold between the minimal and maximal reward. We consider the
computation of the almost-sure (resp. positive) winning sets, where player 1
can ensure that the path constraint is satisfied with probability 1 (resp.
positive probability). Our main results for qualitative path constraints are as
follows: (1) we establish qualitative determinacy results that show that for
every state either player 1 has a strategy to ensure almost-sure (resp.
positive) winning against all player-2 strategies, or player 2 has a spoiling
strategy to falsify almost-sure (resp. positive) winning against all player-1
strategies; (2) we present optimal strategy complexity results that precisely
characterize the classes of strategies required for almost-sure and positive
winning for both players; and (3) we present quadratic time algorithms to
compute the almost-sure and the positive winning sets, matching the best known
bound of algorithms for much simpler problems (such as reachability
objectives). For quantitative constraints we show that a polynomial time
solution for the almost-sure or the positive winning set would imply a solution
to a long-standing open problem (the value problem for turn-based deterministic
mean-payoff games) that is not known to be solvable in polynomial time
Exact Algorithms for Solving Stochastic Games
Shapley's discounted stochastic games, Everett's recursive games and
Gillette's undiscounted stochastic games are classical models of game theory
describing two-player zero-sum games of potentially infinite duration. We
describe algorithms for exactly solving these games
Algorithms for Game Metrics
Simulation and bisimulation metrics for stochastic systems provide a
quantitative generalization of the classical simulation and bisimulation
relations. These metrics capture the similarity of states with respect to
quantitative specifications written in the quantitative {\mu}-calculus and
related probabilistic logics. We first show that the metrics provide a bound
for the difference in long-run average and discounted average behavior across
states, indicating that the metrics can be used both in system verification,
and in performance evaluation. For turn-based games and MDPs, we provide a
polynomial-time algorithm for the computation of the one-step metric distance
between states. The algorithm is based on linear programming; it improves on
the previous known exponential-time algorithm based on a reduction to the
theory of reals. We then present PSPACE algorithms for both the decision
problem and the problem of approximating the metric distance between two
states, matching the best known algorithms for Markov chains. For the
bisimulation kernel of the metric our algorithm works in time O(n^4) for both
turn-based games and MDPs; improving the previously best known O(n^9\cdot
log(n)) time algorithm for MDPs. For a concurrent game G, we show that
computing the exact distance between states is at least as hard as computing
the value of concurrent reachability games and the square-root-sum problem in
computational geometry. We show that checking whether the metric distance is
bounded by a rational r, can be done via a reduction to the theory of real
closed fields, involving a formula with three quantifier alternations, yielding
O(|G|^O(|G|^5)) time complexity, improving the previously known reduction,
which yielded O(|G|^O(|G|^7)) time complexity. These algorithms can be iterated
to approximate the metrics using binary search.Comment: 27 pages. Full version of the paper accepted at FSTTCS 200
Strategy improvement for concurrent reachability and turn based stochastic safety games
We consider concurrent games played on graphs. At every round of a game, each player simultaneously and independently selects a move; the moves jointly determine the transition to a successor state. Two basic objectives are the safety objective to stay forever in a given set of states, and its dual, the reachability objective to reach a given set of states. First, we present a simple proof of the fact that in concurrent reachability games, for all ε>0, memoryless ε-optimal strategies exist. A memoryless strategy is independent of the history of plays, and an ε-optimal strategy achieves the objective with probability within ε of the value of the game. In contrast to previous proofs of this fact, our proof is more elementary and more combinatorial. Second, we present a strategy-improvement (a.k.a. policy-iteration) algorithm for concurrent games with reachability objectives. Finally, we present a strategy-improvement algorithm for turn-based stochastic games (where each player selects moves in turns) with safety objectives. Our algorithms yield sequences of player-1 strategies which ensure probabilities of winning that converge monotonically (from below) to the value of the game. © 2012 Elsevier Inc
IST Austria Technical Report
We consider concurrent games played by two-players on a finite state graph, where in every round the players simultaneously choose a move, and the current state along with the joint moves determine the successor state. We study the most fundamental objective for concurrent games, namely, mean-payoff or limit-average objective, where a reward is associated to every transition, and the goal of player 1 is to maximize the long-run average of the rewards, and the objective of player 2 is strictly the opposite (i.e., the games are zero-sum). The path constraint for player 1 could be qualitative, i.e., the mean-payoff is the maximal reward, or arbitrarily close to it; or quantitative, i.e., a given threshold between the minimal and maximal reward. We consider the computation of the almost-sure (resp. positive) winning sets, where player 1 can ensure that the path constraint is satisfied with probability 1 (resp. positive probability). Almost-sure winning with qualitative constraint exactly corresponds to the question whether there exists a strategy to ensure that the payoff is the maximal reward of the game. Our main results for qualitative path constraints are as follows: (1) we establish qualitative determinacy results that show for every state either player 1 has a strategy to ensure almost-sure (resp. positive) winning against all player-2 strategies or player 2 has a spoiling strategy to falsify almost-sure (resp. positive) winning against all player-1 strategies; (2) we present optimal strategy complexity results that precisely characterize the classes of strategies required for almost-sure and positive winning for both players; and (3) we present quadratic time algorithms to compute the almost-sure and the positive winning sets, matching the best known bound of the algorithms for much simpler problems (such as reachability objectives). For quantitative constraints we show that a polynomial time solution for the almost-sure or the positive winning set would imply a solution to a long-standing open problem (of solving the value problem of mean-payoff games) that is not known to be in polynomial time
The Value 1 Problem Under Finite-memory Strategies for Concurrent Mean-payoff Games
We consider concurrent mean-payoff games, a very well-studied class of
two-player (player 1 vs player 2) zero-sum games on finite-state graphs where
every transition is assigned a reward between 0 and 1, and the payoff function
is the long-run average of the rewards. The value is the maximal expected
payoff that player 1 can guarantee against all strategies of player 2. We
consider the computation of the set of states with value 1 under finite-memory
strategies for player 1, and our main results for the problem are as follows:
(1) we present a polynomial-time algorithm; (2) we show that whenever there is
a finite-memory strategy, there is a stationary strategy that does not need
memory at all; and (3) we present an optimal bound (which is double
exponential) on the patience of stationary strategies (where patience of a
distribution is the inverse of the smallest positive probability and represents
a complexity measure of a stationary strategy)
- …