5,415 research outputs found
Model-free reinforcement learning for stochastic parity games
This paper investigates the use of model-free reinforcement learning to compute the optimal value in two-player stochastic games with parity objectives. In this setting, two decision makers, player Min and player Max, compete on a finite game arena - a stochastic game graph with unknown but fixed probability distributions - to minimize and maximize, respectively, the probability of satisfying a parity objective. We give a reduction from stochastic parity games to a family of stochastic reachability games with a parameter ε, such that the value of a stochastic parity game equals the limit of the values of the corresponding simple stochastic games as the parameter ε tends to 0. Since this reduction does not require the knowledge of the probabilistic transition structure of the underlying game arena, model-free reinforcement learning algorithms, such as minimax Q-learning, can be used to approximate the value and mutual best-response strategies for both players in the underlying stochastic parity game. We also present a streamlined reduction from 112-player parity games to reachability games that avoids recourse to nondeterminism. Finally, we report on the experimental evaluations of both reductions
An Exponential Lower Bound for the Latest Deterministic Strategy Iteration Algorithms
This paper presents a new exponential lower bound for the two most popular
deterministic variants of the strategy improvement algorithms for solving
parity, mean payoff, discounted payoff and simple stochastic games. The first
variant improves every node in each step maximizing the current valuation
locally, whereas the second variant computes the globally optimal improvement
in each step. We outline families of games on which both variants require
exponentially many strategy iterations
Approximating the Value of Energy-Parity Objectives in Simple Stochastic Games
We consider simple stochastic games G with energy-parity objectives, a combination of quantitative rewards with a qualitative parity condition. The Maximizer tries to avoid running out of energy while simultaneously satisfying a parity condition.
We present an algorithm to approximate the value of a given configuration in 2-NEXPTIME. Moreover, ?-optimal strategies for either player require at most O(2-EXP(|G|)?log(1/?)) memory modes
On the Complexity of Branching Games with Regular Conditions
Infinite duration games with regular conditions are one of the crucial tools in the areas of verification and synthesis. In this paper we consider a branching variant of such games - the game contains branching vertices that split the play into two independent sub-games. Thus, a play has the form of~an~infinite tree. The winner of the play is determined by a winning condition specified as a set of infinite trees. Games of this kind were used by Mio to provide a game semantics for the probabilistic mu-calculus. He used winning conditions defined in terms of parity games on trees. In this work we consider a more general class of winning conditions, namely those definable by finite automata on infinite trees. Our games can be seen as a branching-time variant of the stochastic games on graphs.
We address the question of determinacy of a branching game and the problem of computing the optimal game value for each of the players. We consider both the stochastic and non-stochastic variants of the games. The questions under consideration are parametrised by the family of strategies we allow: either mixed, behavioural, or pure.
We prove that in general, branching games are not determined under mixed strategies. This holds even for topologically simple winning conditions (differences of two open sets) and non-stochastic arenas. Nevertheless, we show that the games become determined under mixed strategies if we restrict the winning conditions to open sets of trees. We prove that the problem of comparing the game value to a rational threshold is undecidable for branching games with regular conditions in all non-trivial stochastic cases. In the non-stochastic cases we provide exact bounds on the complexity of the problem. The only case left open is the 0-player stochastic case, i.e. the problem of computing the measure of a given regular language of infinite trees
Obligation Blackwell Games and p-Automata
We recently introduced p-automata, automata that read discrete-time Markov
chains. We used turn-based stochastic parity games to define acceptance of
Markov chains by a subclass of p-automata. Definition of acceptance required a
cumbersome and complicated reduction to a series of turn-based stochastic
parity games. The reduction could not support acceptance by general p-automata,
which was left undefined as there was no notion of games that supported it.
Here we generalize two-player games by adding a structural acceptance
condition called obligations. Obligations are orthogonal to the linear winning
conditions that define winning. Obligations are a declaration that player 0 can
achieve a certain value from a configuration. If the obligation is met, the
value of that configuration for player 0 is 1.
One cannot define value in obligation games by the standard mechanism of
considering the measure of winning paths on a Markov chain and taking the
supremum of the infimum of all strategies. Mainly because obligations need
definition even for Markov chains and the nature of obligations has the flavor
of an infinite nesting of supremum and infimum operators. We define value via a
reduction to turn-based games similar to Martin's proof of determinacy of
Blackwell games with Borel objectives. Based on this definition, we show that
games are determined. We show that for Markov chains with Borel objectives and
obligations, and finite turn-based stochastic parity games with obligations
there exists an alternative and simpler characterization of the value function.
Based on this simpler definition we give an exponential time algorithm to
analyze finite turn-based stochastic parity games with obligations. Finally, we
show that obligation games provide the necessary framework for reasoning about
p-automata and that they generalize the previous definition
Synthesising Strategy Improvement and Recursive Algorithms for Solving 2.5 Player Parity Games
2.5 player parity games combine the challenges posed by 2.5 player
reachability games and the qualitative analysis of parity games. These two
types of problems are best approached with different types of algorithms:
strategy improvement algorithms for 2.5 player reachability games and recursive
algorithms for the qualitative analysis of parity games. We present a method
that - in contrast to existing techniques - tackles both aspects with the best
suited approach and works exclusively on the 2.5 player game itself. The
resulting technique is powerful enough to handle games with several million
states
Decision Problems for Nash Equilibria in Stochastic Games
We analyse the computational complexity of finding Nash equilibria in
stochastic multiplayer games with -regular objectives. While the
existence of an equilibrium whose payoff falls into a certain interval may be
undecidable, we single out several decidable restrictions of the problem.
First, restricting the search space to stationary, or pure stationary,
equilibria results in problems that are typically contained in PSPACE and NP,
respectively. Second, we show that the existence of an equilibrium with a
binary payoff (i.e. an equilibrium where each player either wins or loses with
probability 1) is decidable. We also establish that the existence of a Nash
equilibrium with a certain binary payoff entails the existence of an
equilibrium with the same payoff in pure, finite-state strategies.Comment: 22 pages, revised versio
The Complexity of All-switches Strategy Improvement
Strategy improvement is a widely-used and well-studied class of algorithms
for solving graph-based infinite games. These algorithms are parameterized by a
switching rule, and one of the most natural rules is "all switches" which
switches as many edges as possible in each iteration. Continuing a recent line
of work, we study all-switches strategy improvement from the perspective of
computational complexity. We consider two natural decision problems, both of
which have as input a game , a starting strategy , and an edge . The
problems are: 1.) The edge switch problem, namely, is the edge ever
switched by all-switches strategy improvement when it is started from on
game ? 2.) The optimal strategy problem, namely, is the edge used in the
final strategy that is found by strategy improvement when it is started from
on game ? We show -completeness of the edge switch
problem and optimal strategy problem for the following settings: Parity games
with the discrete strategy improvement algorithm of V\"oge and Jurdzi\'nski;
mean-payoff games with the gain-bias algorithm [14,37]; and discounted-payoff
games and simple stochastic games with their standard strategy improvement
algorithms. We also show -completeness of an analogous problem
to edge switch for the bottom-antipodal algorithm for finding the sink of an
Acyclic Unique Sink Orientation on a cube
Tree games with regular objectives
We study tree games developed recently by Matteo Mio as a game interpretation
of the probabilistic -calculus. With expressive power comes complexity.
Mio showed that tree games are able to encode Blackwell games and,
consequently, are not determined under deterministic strategies.
We show that non-stochastic tree games with objectives recognisable by
so-called game automata are determined under deterministic, finite memory
strategies. Moreover, we give an elementary algorithmic procedure which, for an
arbitrary regular language L and a finite non-stochastic tree game with a
winning objective L decides if the game is determined under deterministic
strategies.Comment: In Proceedings GandALF 2014, arXiv:1408.556
- …