16 research outputs found

    Optimal Strategies in Infinite-state Stochastic Reachability Games

    We consider perfect-information reachability stochastic games for 2 players on infinite graphs. We identify a subclass of such games, and prove two interesting properties of it: first, Player Max always has optimal strategies in games from this subclass, and second, these games are strongly determined. The subclass is defined by the property that the set of all values can only have one accumulation point -- 0. Our results nicely mirror recent results for finitely-branching games, where, on the contrary, Player Min always has optimal strategies. However, our proof methods are substantially different, because the roles of the players are not symmetric. We also do not restrict the branching of the games. Finally, we apply our results in the context of recently studied One-Counter stochastic games

    Recursive Concurrent Stochastic Games

    We study Recursive Concurrent Stochastic Games (RCSGs), extending our recent analysis of recursive simple stochastic games to a concurrent setting where the two players choose moves simultaneously and independently at each state. For multi-exit games, our earlier work already showed undecidability for basic questions like termination, thus we focus on the important case of single-exit RCSGs (1-RCSGs). We first characterize the value of a 1-RCSG termination game as the least fixed point solution of a system of nonlinear minimax functional equations, and use it to show PSPACE decidability for the quantitative termination problem. We then give a strategy improvement technique, which we use to show that player 1 (maximizer) has \epsilon-optimal randomized Stackless & Memoryless (r-SM) strategies for all \epsilon > 0, while player 2 (minimizer) has optimal r-SM strategies. Thus, such games are r-SM-determined. These results mirror and generalize in a strong sense the randomized memoryless determinacy results for finite stochastic games, and extend the classic Hoffman-Karp strategy improvement approach from the finite to an infinite state setting. The proofs in our infinite-state setting are very different however, relying on subtle analytic properties of certain power series that arise from studying 1-RCSGs. We show that our upper bounds, even for qualitative (probability 1) termination, can not be improved, even to NP, without a major breakthrough, by giving two reductions: first a P-time reduction from the long-standing square-root sum problem to the quantitative termination decision problem for finite concurrent stochastic games, and then a P-time reduction from the latter problem to the qualitative termination problem for 1-RCSGs.Comment: 21 pages, 2 figure

    Approximating the Termination Value of One-Counter MDPs and Stochastic Games

    One-counter MDPs (OC-MDPs) and one-counter simple stochastic games (OC-SSGs) are 1-player, and 2-player turn-based zero-sum, stochastic games played on the transition graph of classic one-counter automata (equivalently, pushdown automata with a 1-letter stack alphabet). A key objective for the analysis and verification of these games is the termination objective, where the players aim to maximize (minimize, respectively) the probability of hitting counter value 0, starting at a given control state and given counter value. Recently, we studied qualitative decision problems ("is the optimal termination value = 1?") for OC-MDPs (and OC-SSGs) and showed them to be decidable in P-time (in NP and coNP, respectively). However, quantitative decision and approximation problems ("is the optimal termination value ? p", or "approximate the termination value within epsilon") are far more challenging. This is so in part because optimal strategies may not exist, and because even when they do exist they can have a highly non-trivial structure. It thus remained open even whether any of these quantitative termination problems are computable. In this paper we show that all quantitative approximation problems for the termination value for OC-MDPs and OC-SSGs are computable. Specifically, given a OC-SSG, and given epsilon > 0, we can compute a value v that approximates the value of the OC-SSG termination game within additive error epsilon, and furthermore we can compute epsilon-optimal strategies for both players in the game. A key ingredient in our proofs is a subtle martingale, derived from solving certain LPs that we can associate with a maximizing OC-MDP. An application of Azuma's inequality on these martingales yields a computable bound for the "wealth" at which a "rich person's strategy" becomes epsilon-optimal for OC-MDPs.Comment: 35 pages, 1 figure, full version of a paper presented at ICALP 2011, invited for submission to Information and Computatio

    Markov Decision Processes with Multiple Long-run Average Objectives

    We study Markov decision processes (MDPs) with multiple limit-average (or mean-payoff) functions. We consider two different objectives, namely, expectation and satisfaction objectives. Given an MDP with k limit-average functions, in the expectation objective the goal is to maximize the expected limit-average value, and in the satisfaction objective the goal is to maximize the probability of runs such that the limit-average value stays above a given vector. We show that under the expectation objective, in contrast to the case of one limit-average function, both randomization and memory are necessary for strategies even for epsilon-approximation, and that finite-memory randomized strategies are sufficient for achieving Pareto optimal values. Under the satisfaction objective, in contrast to the case of one limit-average function, infinite memory is necessary for strategies achieving a specific value (i.e. randomized finite-memory strategies are not sufficient), whereas memoryless randomized strategies are sufficient for epsilon-approximation, for all epsilon>0. We further prove that the decision problems for both expectation and satisfaction objectives can be solved in polynomial time and the trade-off curve (Pareto curve) can be epsilon-approximated in time polynomial in the size of the MDP and 1/epsilon, and exponential in the number of limit-average functions, for all epsilon>0. Our analysis also reveals flaws in previous work for MDPs with multiple mean-payoff functions under the expectation objective, corrects the flaws, and allows us to obtain improved results

    Qualitative Reachability in Stochastic BPA Games

    We consider a class of infinite-state stochastic games generated by stateless pushdown automata (or, equivalently, 1-exit recursive state machines), where the winning objective is specified by a regular set of target configurations and a qualitative probability constraint `>0' or `=1'. The goal of one player is to maximize the probability of reaching the target set so that the constraint is satisfied, while the other player aims at the opposite. We show that the winner in such games can be determined in PTIME for the `>0' constraint, and both in NP and coNP for the `=1' constraint. Further, we prove that the winning regions for both players are regular, and we design algorithms which compute the associated finite-state automata. Finally, we show that winning strategies can be synthesized effectively.Comment: Submitted to Information and Computation. 48 pages, 3 figure

    Hyperplane Separation Technique for Multidimensional Mean-Payoff Games

    We consider both finite-state game graphs and recursive game graphs (or pushdown game graphs), that can model the control flow of sequential programs with recursion, with multi-dimensional mean-payoff objectives. In pushdown games two types of strategies are relevant: global strategies, that depend on the entire global history; and modular strategies, that have only local memory and thus do not depend on the context of invocation. We present solutions to several fundamental algorithmic questions and our main contributions are as follows: (1) We show that finite-state multi-dimensional mean-payoff games can be solved in polynomial time if the number of dimensions and the maximal absolute value of the weight is fixed; whereas if the number of dimensions is arbitrary, then problem is already known to be coNP-complete. (2) We show that pushdown graphs with multi-dimensional mean-payoff objectives can be solved in polynomial time. (3) For pushdown games under global strategies both single and multi-dimensional mean-payoff objectives problems are known to be undecidable, and we show that under modular strategies the multi-dimensional problem is also undecidable (whereas under modular strategies the single dimensional problem is NP-complete). We show that if the number of modules, the number of exits, and the maximal absolute value of the weight is fixed, then pushdown games under modular strategies with single dimensional mean-payoff objectives can be solved in polynomial time, and if either of the number of exits or the number of modules is not bounded, then the problem is NP-hard. (4) Finally we show that a fixed parameter tractable algorithm for finite-state multi-dimensional mean-payoff games or pushdown games under modular strategies with single-dimensional mean-payoff objectives would imply the solution of the long-standing open problem of fixed parameter tractability of parity games.Comment: arXiv admin note: text overlap with arXiv:1201.282

    Polynomial Time Algorithms for Branching Markov Decision Processes and Probabilistic Min(Max) Polynomial Bellman Equations

    We show that one can approximate the least fixed point solution for a multivariate system of monotone probabilistic max(min) polynomial equations, referred to as maxPPSs (and minPPSs, respectively), in time polynomial in both the encoding size of the system of equations and in log(1/epsilon), where epsilon > 0 is the desired additive error bound of the solution. (The model of computation is the standard Turing machine model.) We establish this result using a generalization of Newton's method which applies to maxPPSs and minPPSs, even though the underlying functions are only piecewise-differentiable. This generalizes our recent work which provided a P-time algorithm for purely probabilistic PPSs. These equations form the Bellman optimality equations for several important classes of infinite-state Markov Decision Processes (MDPs). Thus, as a corollary, we obtain the first polynomial time algorithms for computing to within arbitrary desired precision the optimal value vector for several classes of infinite-state MDPs which arise as extensions of classic, and heavily studied, purely stochastic processes. These include both the problem of maximizing and mininizing the termination (extinction) probability of multi-type branching MDPs, stochastic context-free MDPs, and 1-exit Recursive MDPs. Furthermore, we also show that we can compute in P-time an epsilon-optimal policy for both maximizing and minimizing branching, context-free, and 1-exit-Recursive MDPs, for any given desired epsilon > 0. This is despite the fact that actually computing optimal strategies is Sqrt-Sum-hard and PosSLP-hard in this setting. We also derive, as an easy consequence of these results, an FNP upper bound on the complexity of computing the value (within arbitrary desired precision) of branching simple stochastic games (BSSGs)