3,932 research outputs found

    Recursive Stochastic Games with Positive Rewards

    Get PDF
    We first show that in such games both players have optimal deterministic “stackless and memoryless” optimal strategies. We then provide polynomial-time algorithms for computing the exact optimal expected reward (which may be infinite, but is otherwise rational), and optimal strategies, for both the maximizing and minimizing single-player versions of the game, i.e., for (1-exit) Recursive Markov Decision Processes (1-RMDPs). It follows that the quantitative decision problem for positive reward 1-RSSGs is in NP ∩ coNP. We show that Condon's well-known quantitative termination problem for finite-state simple stochastic games (SSGs) which she showed to be in NP ∩ coNP reduces to a special case of the reward problem for 1-RSSGs, namely, deciding whether the value is ∞. By contrast, for finite-state SSGs with strictly positive rewards, deciding if this expected reward value is ∞ is solvable in P-time. We also show that there is a simultaneous strategy improvement algorithm that converges in a finite number of steps to the value and optimal strategies of a 1-RSSG with positive rewards

    Recursive Stochastic Games with Positive Rewards

    Get PDF
    Abstract. We study the complexity of a class of Markov decision processes and, more generally, stochastic games, called 1-exit Recursive Markov Decision Processes (1-RMDPs) and Simple Stochastic Games (1-RSSGs) with strictly positive rewards. These are a class of finitely presented countable-state zero-sum stochastic games, with total expected reward objective. They subsume standard finite-state MDPs and Condon’s simple stochastic games and correspond to optimization and game versions of several classic stochastic models, with rewards. Such stochastic models arise naturally as models of probabilistic procedural programs with recursion, and the problems we address are motivated by the goal of analyzing the optimal/pessimal expected running time in such a setting. We give polynomial time algorithms for 1-exit Recursive Markov decision processes (1-RMDPs) with positive rewards. Specifically, we show that the exact optimal value of both maximizing and minimizing 1-RMDPs with positive rewards can be computed in polynomial time (this value may be ∞). For two-player 1-RSSGs with positive rewards, we prove a “stackless and memoryless ” determinacy result, and show that deciding whether the game value is at least a given value r is in NP ∩ coNP. We also prove that a simultaneous strategy improvement algorithm converges to the value and optimal strategies for these stochastic games. We observe that 1-RSSG positive reward games are “harder ” than finite-state SSGs in several senses.

    Equilibria, Fixed Points, and Complexity Classes

    Get PDF
    Many models from a variety of areas involve the computation of an equilibrium or fixed point of some kind. Examples include Nash equilibria in games; market equilibria; computing optimal strategies and the values of competitive games (stochastic and other games); stable configurations of neural networks; analysing basic stochastic models for evolution like branching processes and for language like stochastic context-free grammars; and models that incorporate the basic primitives of probability and recursion like recursive Markov chains. It is not known whether these problems can be solved in polynomial time. There are certain common computational principles underlying different types of equilibria, which are captured by the complexity classes PLS, PPAD, and FIXP. Representative complete problems for these classes are respectively, pure Nash equilibria in games where they are guaranteed to exist, (mixed) Nash equilibria in 2-player normal form games, and (mixed) Nash equilibria in normal form games with 3 (or more) players. This paper reviews the underlying computational principles and the corresponding classes

    One-Counter Stochastic Games

    Get PDF
    We study the computational complexity of basic decision problems for one-counter simple stochastic games (OC-SSGs), under various objectives. OC-SSGs are 2-player turn-based stochastic games played on the transition graph of classic one-counter automata. We study primarily the termination objective, where the goal of one player is to maximize the probability of reaching counter value 0, while the other player wishes to avoid this. Partly motivated by the goal of understanding termination objectives, we also study certain "limit" and "long run average" reward objectives that are closely related to some well-studied objectives for stochastic games with rewards. Examples of problems we address include: does player 1 have a strategy to ensure that the counter eventually hits 0, i.e., terminates, almost surely, regardless of what player 2 does? Or that the liminf (or limsup) counter value equals infinity with a desired probability? Or that the long run average reward is >0 with desired probability? We show that the qualitative termination problem for OC-SSGs is in NP intersection coNP, and is in P-time for 1-player OC-SSGs, or equivalently for one-counter Markov Decision Processes (OC-MDPs). Moreover, we show that quantitative limit problems for OC-SSGs are in NP intersection coNP, and are in P-time for 1-player OC-MDPs. Both qualitative limit problems and qualitative termination problems for OC-SSGs are already at least as hard as Condon's quantitative decision problem for finite-state SSGs.Comment: 20 pages, 1 figure. This is a full version of a paper accepted for publication in proceedings of FSTTCS 201

    Exact Algorithms for Solving Stochastic Games

    Full text link
    Shapley's discounted stochastic games, Everett's recursive games and Gillette's undiscounted stochastic games are classical models of game theory describing two-player zero-sum games of potentially infinite duration. We describe algorithms for exactly solving these games

    Qualitative Analysis of Concurrent Mean-payoff Games

    Get PDF
    We consider concurrent games played by two-players on a finite-state graph, where in every round the players simultaneously choose a move, and the current state along with the joint moves determine the successor state. We study a fundamental objective, namely, mean-payoff objective, where a reward is associated to each transition, and the goal of player 1 is to maximize the long-run average of the rewards, and the objective of player 2 is strictly the opposite. The path constraint for player 1 could be qualitative, i.e., the mean-payoff is the maximal reward, or arbitrarily close to it; or quantitative, i.e., a given threshold between the minimal and maximal reward. We consider the computation of the almost-sure (resp. positive) winning sets, where player 1 can ensure that the path constraint is satisfied with probability 1 (resp. positive probability). Our main results for qualitative path constraints are as follows: (1) we establish qualitative determinacy results that show that for every state either player 1 has a strategy to ensure almost-sure (resp. positive) winning against all player-2 strategies, or player 2 has a spoiling strategy to falsify almost-sure (resp. positive) winning against all player-1 strategies; (2) we present optimal strategy complexity results that precisely characterize the classes of strategies required for almost-sure and positive winning for both players; and (3) we present quadratic time algorithms to compute the almost-sure and the positive winning sets, matching the best known bound of algorithms for much simpler problems (such as reachability objectives). For quantitative constraints we show that a polynomial time solution for the almost-sure or the positive winning set would imply a solution to a long-standing open problem (the value problem for turn-based deterministic mean-payoff games) that is not known to be solvable in polynomial time

    On values of repeated games with signals

    Get PDF
    We study the existence of different notions of value in two-person zero-sum repeated games where the state evolves and players receive signals. We provide some examples showing that the limsup value (and the uniform value) may not exist in general. Then we show the existence of the value for any Borel payoff function if the players observe a public signal including the actions played. We also prove two other positive results without assumptions on the signaling structure: the existence of the sup⁥\sup value in any game and the existence of the uniform value in recursive games with nonnegative payoffs.Comment: Published at http://dx.doi.org/10.1214/14-AAP1095 in the Annals of Applied Probability (http://www.imstat.org/aap/) by the Institute of Mathematical Statistics (http://www.imstat.org

    The Complexity of Nash Equilibria in Limit-Average Games

    Full text link
    We study the computational complexity of Nash equilibria in concurrent games with limit-average objectives. In particular, we prove that the existence of a Nash equilibrium in randomised strategies is undecidable, while the existence of a Nash equilibrium in pure strategies is decidable, even if we put a constraint on the payoff of the equilibrium. Our undecidability result holds even for a restricted class of concurrent games, where nonzero rewards occur only on terminal states. Moreover, we show that the constrained existence problem is undecidable not only for concurrent games but for turn-based games with the same restriction on rewards. Finally, we prove that the constrained existence problem for Nash equilibria in (pure or randomised) stationary strategies is decidable and analyse its complexity.Comment: 34 page
    • 

    corecore