4,301 research outputs found
Model-free reinforcement learning for stochastic parity games
This paper investigates the use of model-free reinforcement learning to compute the optimal value in two-player stochastic games with parity objectives. In this setting, two decision makers, player Min and player Max, compete on a finite game arena - a stochastic game graph with unknown but fixed probability distributions - to minimize and maximize, respectively, the probability of satisfying a parity objective. We give a reduction from stochastic parity games to a family of stochastic reachability games with a parameter ε, such that the value of a stochastic parity game equals the limit of the values of the corresponding simple stochastic games as the parameter ε tends to 0. Since this reduction does not require the knowledge of the probabilistic transition structure of the underlying game arena, model-free reinforcement learning algorithms, such as minimax Q-learning, can be used to approximate the value and mutual best-response strategies for both players in the underlying stochastic parity game. We also present a streamlined reduction from 112-player parity games to reachability games that avoids recourse to nondeterminism. Finally, we report on the experimental evaluations of both reductions
Obligation Blackwell Games and p-Automata
We recently introduced p-automata, automata that read discrete-time Markov
chains. We used turn-based stochastic parity games to define acceptance of
Markov chains by a subclass of p-automata. Definition of acceptance required a
cumbersome and complicated reduction to a series of turn-based stochastic
parity games. The reduction could not support acceptance by general p-automata,
which was left undefined as there was no notion of games that supported it.
Here we generalize two-player games by adding a structural acceptance
condition called obligations. Obligations are orthogonal to the linear winning
conditions that define winning. Obligations are a declaration that player 0 can
achieve a certain value from a configuration. If the obligation is met, the
value of that configuration for player 0 is 1.
One cannot define value in obligation games by the standard mechanism of
considering the measure of winning paths on a Markov chain and taking the
supremum of the infimum of all strategies. Mainly because obligations need
definition even for Markov chains and the nature of obligations has the flavor
of an infinite nesting of supremum and infimum operators. We define value via a
reduction to turn-based games similar to Martin's proof of determinacy of
Blackwell games with Borel objectives. Based on this definition, we show that
games are determined. We show that for Markov chains with Borel objectives and
obligations, and finite turn-based stochastic parity games with obligations
there exists an alternative and simpler characterization of the value function.
Based on this simpler definition we give an exponential time algorithm to
analyze finite turn-based stochastic parity games with obligations. Finally, we
show that obligation games provide the necessary framework for reasoning about
p-automata and that they generalize the previous definition
Synthesising Strategy Improvement and Recursive Algorithms for Solving 2.5 Player Parity Games
2.5 player parity games combine the challenges posed by 2.5 player
reachability games and the qualitative analysis of parity games. These two
types of problems are best approached with different types of algorithms:
strategy improvement algorithms for 2.5 player reachability games and recursive
algorithms for the qualitative analysis of parity games. We present a method
that - in contrast to existing techniques - tackles both aspects with the best
suited approach and works exclusively on the 2.5 player game itself. The
resulting technique is powerful enough to handle games with several million
states
Tree games with regular objectives
We study tree games developed recently by Matteo Mio as a game interpretation
of the probabilistic -calculus. With expressive power comes complexity.
Mio showed that tree games are able to encode Blackwell games and,
consequently, are not determined under deterministic strategies.
We show that non-stochastic tree games with objectives recognisable by
so-called game automata are determined under deterministic, finite memory
strategies. Moreover, we give an elementary algorithmic procedure which, for an
arbitrary regular language L and a finite non-stochastic tree game with a
winning objective L decides if the game is determined under deterministic
strategies.Comment: In Proceedings GandALF 2014, arXiv:1408.556
An Exponential Lower Bound for the Latest Deterministic Strategy Iteration Algorithms
This paper presents a new exponential lower bound for the two most popular
deterministic variants of the strategy improvement algorithms for solving
parity, mean payoff, discounted payoff and simple stochastic games. The first
variant improves every node in each step maximizing the current valuation
locally, whereas the second variant computes the globally optimal improvement
in each step. We outline families of games on which both variants require
exponentially many strategy iterations
Decision Problems for Nash Equilibria in Stochastic Games
We analyse the computational complexity of finding Nash equilibria in
stochastic multiplayer games with -regular objectives. While the
existence of an equilibrium whose payoff falls into a certain interval may be
undecidable, we single out several decidable restrictions of the problem.
First, restricting the search space to stationary, or pure stationary,
equilibria results in problems that are typically contained in PSPACE and NP,
respectively. Second, we show that the existence of an equilibrium with a
binary payoff (i.e. an equilibrium where each player either wins or loses with
probability 1) is decidable. We also establish that the existence of a Nash
equilibrium with a certain binary payoff entails the existence of an
equilibrium with the same payoff in pure, finite-state strategies.Comment: 22 pages, revised versio
The Complexity of All-switches Strategy Improvement
Strategy improvement is a widely-used and well-studied class of algorithms
for solving graph-based infinite games. These algorithms are parameterized by a
switching rule, and one of the most natural rules is "all switches" which
switches as many edges as possible in each iteration. Continuing a recent line
of work, we study all-switches strategy improvement from the perspective of
computational complexity. We consider two natural decision problems, both of
which have as input a game , a starting strategy , and an edge . The
problems are: 1.) The edge switch problem, namely, is the edge ever
switched by all-switches strategy improvement when it is started from on
game ? 2.) The optimal strategy problem, namely, is the edge used in the
final strategy that is found by strategy improvement when it is started from
on game ? We show -completeness of the edge switch
problem and optimal strategy problem for the following settings: Parity games
with the discrete strategy improvement algorithm of V\"oge and Jurdzi\'nski;
mean-payoff games with the gain-bias algorithm [14,37]; and discounted-payoff
games and simple stochastic games with their standard strategy improvement
algorithms. We also show -completeness of an analogous problem
to edge switch for the bottom-antipodal algorithm for finding the sink of an
Acyclic Unique Sink Orientation on a cube
Qualitative Analysis of Partially-observable Markov Decision Processes
We study observation-based strategies for partially-observable Markov
decision processes (POMDPs) with omega-regular objectives. An observation-based
strategy relies on partial information about the history of a play, namely, on
the past sequence of observations. We consider the qualitative analysis
problem: given a POMDP with an omega-regular objective, whether there is an
observation-based strategy to achieve the objective with probability~1
(almost-sure winning), or with positive probability (positive winning). Our
main results are twofold. First, we present a complete picture of the
computational complexity of the qualitative analysis of POMDP s with parity
objectives (a canonical form to express omega-regular objectives) and its
subclasses. Our contribution consists in establishing several upper and lower
bounds that were not known in literature. Second, we present optimal bounds
(matching upper and lower bounds) on the memory required by pure and randomized
observation-based strategies for the qualitative analysis of POMDP s with
parity objectives and its subclasses
- …