Search CORE

4,301 research outputs found

Model-free reinforcement learning for stochastic parity games

Author: Hahn EM
Perez M
Schewe S
Somenzi F
Trivedi A
Wojtczak D
Publication venue
Publication date: 01/01/2020
Field of study

This paper investigates the use of model-free reinforcement learning to compute the optimal value in two-player stochastic games with parity objectives. In this setting, two decision makers, player Min and player Max, compete on a finite game arena - a stochastic game graph with unknown but fixed probability distributions - to minimize and maximize, respectively, the probability of satisfying a parity objective. We give a reduction from stochastic parity games to a family of stochastic reachability games with a parameter ε, such that the value of a stochastic parity game equals the limit of the values of the corresponding simple stochastic games as the parameter ε tends to 0. Since this reduction does not require the knowledge of the probabilistic transition structure of the underlying game arena, model-free reinforcement learning algorithms, such as minimax Q-learning, can be used to approximate the value and mutual best-response strategies for both players in the underlying stochastic parity game. We also present a streamlined reduction from 112-player parity games to reachability games that avoids recourse to nondeterminism. Finally, we report on the experimental evaluations of both reductions

University of Liverpool Repository

Dagstuhl Research Online Publication Server

University of Twente Research Information

Obligation Blackwell Games and p-Automata

Author: Chatterjee Krishnendu
Piterman Nir
Publication venue
Publication date: 03/11/2013
Field of study

We recently introduced p-automata, automata that read discrete-time Markov chains. We used turn-based stochastic parity games to define acceptance of Markov chains by a subclass of p-automata. Definition of acceptance required a cumbersome and complicated reduction to a series of turn-based stochastic parity games. The reduction could not support acceptance by general p-automata, which was left undefined as there was no notion of games that supported it. Here we generalize two-player games by adding a structural acceptance condition called obligations. Obligations are orthogonal to the linear winning conditions that define winning. Obligations are a declaration that player 0 can achieve a certain value from a configuration. If the obligation is met, the value of that configuration for player 0 is 1. One cannot define value in obligation games by the standard mechanism of considering the measure of winning paths on a Markov chain and taking the supremum of the infimum of all strategies. Mainly because obligations need definition even for Markov chains and the nature of obligations has the flavor of an infinite nesting of supremum and infimum operators. We define value via a reduction to turn-based games similar to Martin's proof of determinacy of Blackwell games with Borel objectives. Based on this definition, we show that games are determined. We show that for Markov chains with Borel objectives and obligations, and finite turn-based stochastic parity games with obligations there exists an alternative and simpler characterization of the value function. Based on this simpler definition we give an exponential time algorithm to analyze finite turn-based stochastic parity games with obligations. Finally, we show that obligation games provide the necessary framework for reasoning about p-automata and that they generalize the previous definition

arXiv.org e-Print Archive

CiteSeerX

Leicester Research Archive

Synthesising Strategy Improvement and Recursive Algorithms for Solving 2.5 Player Parity Games

Author: Hahn Ernst Moritz
Schewe Sven
Turrini Andrea
Zhang Lijun
Publication venue
Publication date: 05/07/2016
Field of study

2.5 player parity games combine the challenges posed by 2.5 player reachability games and the qualitative analysis of parity games. These two types of problems are best approached with different types of algorithms: strategy improvement algorithms for 2.5 player reachability games and recursive algorithms for the qualitative analysis of parity games. We present a method that - in contrast to existing techniques - tackles both aspects with the best suited approach and works exclusively on the 2.5 player game itself. The resulting technique is powerful enough to handle games with several million states

arXiv.org e-Print Archive

University of Liverpool Repository

Crossref

Tree games with regular objectives

Author: Przybyłko Marcin
Publication venue: 'Open Publishing Association'
Publication date: 01/08/2014
Field of study

We study tree games developed recently by Matteo Mio as a game interpretation of the probabilistic

\mu

-calculus. With expressive power comes complexity. Mio showed that tree games are able to encode Blackwell games and, consequently, are not determined under deterministic strategies. We show that non-stochastic tree games with objectives recognisable by so-called game automata are determined under deterministic, finite memory strategies. Moreover, we give an elementary algorithmic procedure which, for an arbitrary regular language L and a finite non-stochastic tree game with a winning objective L decides if the game is determined under deterministic strategies.Comment: In Proceedings GandALF 2014, arXiv:1408.556

arXiv.org e-Print Archive

Directory of Open Access Journals

An Exponential Lower Bound for the Latest Deterministic Strategy Iteration Algorithms

Author: A. Ehrenfeucht and J. Mycielski
Anne Condon
Henrik Björklund and Sergei Vorobyov
Leonid Khachiyan
M. Jurdznski
Nir Piterman
Oliver Friedmann
Oliver Friedmann
Uri Zwick and Mike Paterson
W. Zielonka
Publication venue: 'Logical Methods in Computer Science e.V.'
Publication date: 01/01/2010
Field of study

This paper presents a new exponential lower bound for the two most popular deterministic variants of the strategy improvement algorithms for solving parity, mean payoff, discounted payoff and simple stochastic games. The first variant improves every node in each step maximizing the current valuation locally, whereas the second variant computes the globally optimal improvement in each step. We outline families of games on which both variants require exponentially many strategy iterations

arXiv.org e-Print Archive

CiteSeerX

Crossref

Episciences.org

Directory of Open Access Journals

Decision Problems for Nash Equilibria in Stochastic Games

Author: A. Condon
C. Daskalakis
C.A. Courcoubetis
C.A. Courcoubetis
D.A. Martin
E. Allender
E.A. Emerson
J. Canny
J.F. Nash Jr.
K. Chatterjee
K. Chatterjee
K. Chatterjee
K. Chatterjee
L. Alfaro de
M. Ummels
M. Ummels
M.J. Osborne
M.L. Puterman
P. Hunter
W. Thomas
X. Chen
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

We analyse the computational complexity of finding Nash equilibria in stochastic multiplayer games with

\omega

-regular objectives. While the existence of an equilibrium whose payoff falls into a certain interval may be undecidable, we single out several decidable restrictions of the problem. First, restricting the search space to stationary, or pure stationary, equilibria results in problems that are typically contained in PSPACE and NP, respectively. Second, we show that the existence of an equilibrium with a binary payoff (i.e. an equilibrium where each player either wins or loses with probability 1) is decidable. We also establish that the existence of a Nash equilibrium with a certain binary payoff entails the existence of an equilibrium with the same payoff in pure, finite-state strategies.Comment: 22 pages, revised versio

arXiv.org e-Print Archive

CiteSeerX

Crossref

CWI's Institutional Repository

Publikationsserver der RWTH Aachen University

The Complexity of All-switches Strategy Improvement

Author: Fearnley John
Savani Rahul
Publication venue
Publication date: 01/01/2018
Field of study

Strategy improvement is a widely-used and well-studied class of algorithms for solving graph-based infinite games. These algorithms are parameterized by a switching rule, and one of the most natural rules is "all switches" which switches as many edges as possible in each iteration. Continuing a recent line of work, we study all-switches strategy improvement from the perspective of computational complexity. We consider two natural decision problems, both of which have as input a game

G

, a starting strategy

s

, and an edge

e

. The problems are: 1.) The edge switch problem, namely, is the edge

e

ever switched by all-switches strategy improvement when it is started from

s

on game

G

? 2.) The optimal strategy problem, namely, is the edge

e

used in the final strategy that is found by strategy improvement when it is started from

s

on game

G

? We show

\mathtt{PSPACE}

-completeness of the edge switch problem and optimal strategy problem for the following settings: Parity games with the discrete strategy improvement algorithm of V\"oge and Jurdzi\'nski; mean-payoff games with the gain-bias algorithm [14,37]; and discounted-payoff games and simple stochastic games with their standard strategy improvement algorithms. We also show

\mathtt{PSPACE}

-completeness of an analogous problem to edge switch for the bottom-antipodal algorithm for finding the sink of an Acyclic Unique Sink Orientation on a cube

arXiv.org e-Print Archive

University of Liverpool Repository

Crossref

Episciences.org

Directory of Open Access Journals

Qualitative Analysis of Partially-observable Markov Decision Processes

Author: A. Bianco
A. Kechris
A. Paz
C. Baier
C.H. Papadimitriou
D. Berwanger
J. Reif
M. De Wulf
M.Y. Vardi
N. Bertrand
R. Chadha
V. Gripon
W. Thomas
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

We study observation-based strategies for partially-observable Markov decision processes (POMDPs) with omega-regular objectives. An observation-based strategy relies on partial information about the history of a play, namely, on the past sequence of observations. We consider the qualitative analysis problem: given a POMDP with an omega-regular objective, whether there is an observation-based strategy to achieve the objective with probability~1 (almost-sure winning), or with positive probability (positive winning). Our main results are twofold. First, we present a complete picture of the computational complexity of the qualitative analysis of POMDP s with parity objectives (a canonical form to express omega-regular objectives) and its subclasses. Our contribution consists in establishing several upper and lower bounds that were not known in literature. Second, we present optimal bounds (matching upper and lower bounds) on the memory required by pure and randomized observation-based strategies for the qualitative analysis of POMDP s with parity objectives and its subclasses

arXiv.org e-Print Archive

CiteSeerX

Crossref

IST PubRep

IST Austria: PubRep (Institute of Science and Technology)