3,932 research outputs found
Recursive Stochastic Games with Positive Rewards
We first show that in such games both players have optimal deterministic âstackless and memorylessâ optimal strategies. We then provide polynomial-time algorithms for computing the exact optimal expected reward (which may be infinite, but is otherwise rational), and optimal strategies, for both the maximizing and minimizing single-player versions of the game, i.e., for (1-exit) Recursive Markov Decision Processes (1-RMDPs). It follows that the quantitative decision problem for positive reward 1-RSSGs is in NP â© coNP. We show that Condon's well-known quantitative termination problem for finite-state simple stochastic games (SSGs) which she showed to be in NP â© coNP reduces to a special case of the reward problem for 1-RSSGs, namely, deciding whether the value is â. By contrast, for finite-state SSGs with strictly positive rewards, deciding if this expected reward value is â is solvable in P-time. We also show that there is a simultaneous strategy improvement algorithm that converges in a finite number of steps to the value and optimal strategies of a 1-RSSG with positive rewards
Recursive Stochastic Games with Positive Rewards
Abstract. We study the complexity of a class of Markov decision processes and, more generally, stochastic games, called 1-exit Recursive Markov Decision Processes (1-RMDPs) and Simple Stochastic Games (1-RSSGs) with strictly positive rewards. These are a class of finitely presented countable-state zero-sum stochastic games, with total expected reward objective. They subsume standard finite-state MDPs and Condonâs simple stochastic games and correspond to optimization and game versions of several classic stochastic models, with rewards. Such stochastic models arise naturally as models of probabilistic procedural programs with recursion, and the problems we address are motivated by the goal of analyzing the optimal/pessimal expected running time in such a setting. We give polynomial time algorithms for 1-exit Recursive Markov decision processes (1-RMDPs) with positive rewards. Specifically, we show that the exact optimal value of both maximizing and minimizing 1-RMDPs with positive rewards can be computed in polynomial time (this value may be â). For two-player 1-RSSGs with positive rewards, we prove a âstackless and memoryless â determinacy result, and show that deciding whether the game value is at least a given value r is in NP â© coNP. We also prove that a simultaneous strategy improvement algorithm converges to the value and optimal strategies for these stochastic games. We observe that 1-RSSG positive reward games are âharder â than finite-state SSGs in several senses.
Equilibria, Fixed Points, and Complexity Classes
Many models from a variety of areas involve the computation of an equilibrium
or fixed point of some kind. Examples include Nash equilibria in games; market
equilibria; computing optimal strategies and the values of competitive games
(stochastic and other games); stable configurations of neural networks;
analysing basic stochastic models for evolution like branching processes and
for language like stochastic context-free grammars; and models that incorporate
the basic primitives of probability and recursion like recursive Markov chains.
It is not known whether these problems can be solved in polynomial time. There
are certain common computational principles underlying different types of
equilibria, which are captured by the complexity classes PLS, PPAD, and FIXP.
Representative complete problems for these classes are respectively, pure Nash
equilibria in games where they are guaranteed to exist, (mixed) Nash equilibria
in 2-player normal form games, and (mixed) Nash equilibria in normal form games
with 3 (or more) players. This paper reviews the underlying computational
principles and the corresponding classes
One-Counter Stochastic Games
We study the computational complexity of basic decision problems for
one-counter simple stochastic games (OC-SSGs), under various objectives.
OC-SSGs are 2-player turn-based stochastic games played on the transition graph
of classic one-counter automata. We study primarily the termination objective,
where the goal of one player is to maximize the probability of reaching counter
value 0, while the other player wishes to avoid this. Partly motivated by the
goal of understanding termination objectives, we also study certain "limit" and
"long run average" reward objectives that are closely related to some
well-studied objectives for stochastic games with rewards. Examples of problems
we address include: does player 1 have a strategy to ensure that the counter
eventually hits 0, i.e., terminates, almost surely, regardless of what player 2
does? Or that the liminf (or limsup) counter value equals infinity with a
desired probability? Or that the long run average reward is >0 with desired
probability? We show that the qualitative termination problem for OC-SSGs is in
NP intersection coNP, and is in P-time for 1-player OC-SSGs, or equivalently
for one-counter Markov Decision Processes (OC-MDPs). Moreover, we show that
quantitative limit problems for OC-SSGs are in NP intersection coNP, and are in
P-time for 1-player OC-MDPs. Both qualitative limit problems and qualitative
termination problems for OC-SSGs are already at least as hard as Condon's
quantitative decision problem for finite-state SSGs.Comment: 20 pages, 1 figure. This is a full version of a paper accepted for
publication in proceedings of FSTTCS 201
Exact Algorithms for Solving Stochastic Games
Shapley's discounted stochastic games, Everett's recursive games and
Gillette's undiscounted stochastic games are classical models of game theory
describing two-player zero-sum games of potentially infinite duration. We
describe algorithms for exactly solving these games
Qualitative Analysis of Concurrent Mean-payoff Games
We consider concurrent games played by two-players on a finite-state graph,
where in every round the players simultaneously choose a move, and the current
state along with the joint moves determine the successor state. We study a
fundamental objective, namely, mean-payoff objective, where a reward is
associated to each transition, and the goal of player 1 is to maximize the
long-run average of the rewards, and the objective of player 2 is strictly the
opposite. The path constraint for player 1 could be qualitative, i.e., the
mean-payoff is the maximal reward, or arbitrarily close to it; or quantitative,
i.e., a given threshold between the minimal and maximal reward. We consider the
computation of the almost-sure (resp. positive) winning sets, where player 1
can ensure that the path constraint is satisfied with probability 1 (resp.
positive probability). Our main results for qualitative path constraints are as
follows: (1) we establish qualitative determinacy results that show that for
every state either player 1 has a strategy to ensure almost-sure (resp.
positive) winning against all player-2 strategies, or player 2 has a spoiling
strategy to falsify almost-sure (resp. positive) winning against all player-1
strategies; (2) we present optimal strategy complexity results that precisely
characterize the classes of strategies required for almost-sure and positive
winning for both players; and (3) we present quadratic time algorithms to
compute the almost-sure and the positive winning sets, matching the best known
bound of algorithms for much simpler problems (such as reachability
objectives). For quantitative constraints we show that a polynomial time
solution for the almost-sure or the positive winning set would imply a solution
to a long-standing open problem (the value problem for turn-based deterministic
mean-payoff games) that is not known to be solvable in polynomial time
On values of repeated games with signals
We study the existence of different notions of value in two-person zero-sum
repeated games where the state evolves and players receive signals. We provide
some examples showing that the limsup value (and the uniform value) may not
exist in general. Then we show the existence of the value for any Borel payoff
function if the players observe a public signal including the actions played.
We also prove two other positive results without assumptions on the signaling
structure: the existence of the value in any game and the existence of
the uniform value in recursive games with nonnegative payoffs.Comment: Published at http://dx.doi.org/10.1214/14-AAP1095 in the Annals of
Applied Probability (http://www.imstat.org/aap/) by the Institute of
Mathematical Statistics (http://www.imstat.org
The Complexity of Nash Equilibria in Limit-Average Games
We study the computational complexity of Nash equilibria in concurrent games
with limit-average objectives. In particular, we prove that the existence of a
Nash equilibrium in randomised strategies is undecidable, while the existence
of a Nash equilibrium in pure strategies is decidable, even if we put a
constraint on the payoff of the equilibrium. Our undecidability result holds
even for a restricted class of concurrent games, where nonzero rewards occur
only on terminal states. Moreover, we show that the constrained existence
problem is undecidable not only for concurrent games but for turn-based games
with the same restriction on rewards. Finally, we prove that the constrained
existence problem for Nash equilibria in (pure or randomised) stationary
strategies is decidable and analyse its complexity.Comment: 34 page
- âŠ