103 research outputs found

    Parity Objectives in Countable MDPs

    Get PDF
    We study countably infinite MDPs with parity objectives, and special cases with a bounded number of colors in the Mostowski hierarchy (including reachability, safety, Büchi and co-Büchi). In finite MDPs there always exist optimal memoryless deterministic (MD) strategies for parity objectives, but this does not generally hold for countably infinite MDPs. In particular, optimal strategies need not exist. For countable infinite MDPs, we provide a complete picture of the memory requirements of optimal (resp., c-optimal) strategies for all objectives in the Mostowski hierarchy. In particular, there is a strong dichotomy between two different types of objectives. For the first type, optimal strategies, if they exist, can be chosen MD, while for the second type optimal strategies require infinite memory. (I.e., for all objectives in the Mostowski hierarchy, if finite-memory randomized strategies suffice then also MD-strategies suffice.) Similarly, some objectives admit c-optimal MD-strategies, while for others c-optimal strategies require infinite memory. Such a dichotomy also holds for the subclass of countably infinite MDPs that are finitely branching, though more objectives admit MD-strategies here

    Strategy Complexity of Parity Objectives in Countable MDPs

    Get PDF
    We study countably infinite MDPs with parity objectives. Unlike in finite MDPs, optimal strategies need not exist, and may require infinite memory if they do. We provide a complete picture of the exact strategy complexity of ε\varepsilon-optimal strategies (and optimal strategies, where they exist) for all subclasses of parity objectives in the Mostowski hierarchy. Either MD-strategies, Markov strategies, or 1-bit Markov strategies are necessary and sufficient, depending on the number of colors, the branching degree of the MDP, and whether one considers ε\varepsilon-optimal or optimal strategies. In particular, 1-bit Markov strategies are necessary and sufficient for ε\varepsilon-optimal (resp. optimal) strategies for general parity objectives.Comment: This is the full version of a paper presented at CONCUR 202

    Randomness for Free

    Get PDF
    We consider two-player zero-sum games on graphs. These games can be classified on the basis of the information of the players and on the mode of interaction between them. On the basis of information the classification is as follows: (a) partial-observation (both players have partial view of the game); (b) one-sided complete-observation (one player has complete observation); and (c) complete-observation (both players have complete view of the game). On the basis of mode of interaction we have the following classification: (a) concurrent (both players interact simultaneously); and (b) turn-based (both players interact in turn). The two sources of randomness in these games are randomness in transition function and randomness in strategies. In general, randomized strategies are more powerful than deterministic strategies, and randomness in transitions gives more general classes of games. In this work we present a complete characterization for the classes of games where randomness is not helpful in: (a) the transition function probabilistic transition can be simulated by deterministic transition); and (b) strategies (pure strategies are as powerful as randomized strategies). As consequence of our characterization we obtain new undecidability results for these games

    Büchi Objectives in Countable MDPs

    Get PDF
    We study countably infinite Markov decision processes with B\"uchi objectives, which ask to visit a given subset of states infinitely often. A question left open by T.P. Hill in 1979 is whether there always exist ε\varepsilon-optimal Markov strategies, i.e., strategies that base decisions only on the current state and the number of steps taken so far. We provide a negative answer to this question by constructing a non-trivial counterexample. On the other hand, we show that Markov strategies with only 1 bit of extra memory are sufficient

    How to Play in Infinite MDPs (Invited Talk)

    Get PDF
    International audienceMarkov decision processes (MDPs) are a standard model for dynamic systems that exhibit both stochastic and nondeterministic behavior. For MDPs with finite state space it is known that for a wide range of objectives there exist optimal strategies that are memoryless and deterministic. In contrast, if the state space is infinite, optimal strategies may not exist, and optimal or ε-optimal strategies may require (possibly infinite) memory. In this paper we consider qualitative objectives: reachability, safety, (co-)Büchi, and other parity objectives. We aim at giving an introduction to a collection of techniques that allow for the construction of strategies with little or no memory in countably infinite MDPs

    Concurrent games with tail objectives

    Get PDF
    AbstractWe study infinite stochastic games played by two players over a finite state space, with objectives specified by sets of infinite traces. The games are concurrent (players make moves simultaneously and independently), stochastic (the next state is determined by a probability distribution that depends on the current state and chosen moves of the players) and infinite (proceed for an infinite number of rounds). The analysis of concurrent stochastic games can be classified into: quantitative analysis, analyzing the optimum value of the game and ε-optimal strategies that ensure values within ε of the optimum value; and qualitative analysis, analyzing the set of states with optimum value 1 and ε-optimal strategies for the states with optimum value 1. We consider concurrent games with tail objectives, i.e., objectives that are independent of the finite-prefix of traces, and show that the class of tail objectives is strictly richer than that of the ω-regular objectives. We develop new proof techniques to extend several properties of concurrent games with ω-regular objectives to concurrent games with tail objectives. We prove the positive limit-one property for tail objectives. The positive limit-one property states that for all concurrent games if the optimum value for a player is positive for a tail objective Φ at some state, then there is a state where the optimum value is 1 for the player for the objective Φ. We also show that the optimum values of zero-sum (strictly conflicting objectives) games with tail objectives can be related to equilibrium values of nonzero-sum (not strictly conflicting objectives) games with simpler reachability objectives. A consequence of our analysis presents a polynomial time reduction of the quantitative analysis of tail objectives to the qualitative analysis for the subclass of one-player stochastic games (Markov decision processes)

    Strategy Complexity of Point Payoff, Mean Payoff and Total Payoff Objectives in Countable MDPs

    Get PDF
    We study countably infinite Markov decision processes (MDPs) with real-valued transition rewards. Every infinite run induces the following sequences of payoffs: 1. Point payoff (the sequence of directly seen transition rewards), 2. Mean payoff (the sequence of the sums of all rewards so far, divided by the number of steps), and 3. Total payoff (the sequence of the sums of all rewards so far). For each payoff type, the objective is to maximize the probability that the lim inf\liminf is non-negative. We establish the complete picture of the strategy complexity of these objectives, i.e., how much memory is necessary and sufficient for ε\varepsilon-optimal (resp. optimal) strategies. Some cases can be won with memoryless deterministic strategies, while others require a step counter, a reward counter, or both.Comment: Revised and extended journal version of results presented at the CONCUR 2021 conference. For a special issue in the arxiv overlay journal LMCS (https://lmcs.episciences.org). This is not a duplicate of arXiv:2107.03287 (the conference version), but the significantly changed journal version for LMCS (which uses arXiv as a backend

    Probabilistic Opacity for Markov Decision Processes

    Full text link
    Opacity is a generic security property, that has been defined on (non probabilistic) transition systems and later on Markov chains with labels. For a secret predicate, given as a subset of runs, and a function describing the view of an external observer, the value of interest for opacity is a measure of the set of runs disclosing the secret. We extend this definition to the richer framework of Markov decision processes, where non deterministic choice is combined with probabilistic transitions, and we study related decidability problems with partial or complete observation hypotheses for the schedulers. We prove that all questions are decidable with complete observation and ω\omega-regular secrets. With partial observation, we prove that all quantitative questions are undecidable but the question whether a system is almost surely non opaque becomes decidable for a restricted class of ω\omega-regular secrets, as well as for all ω\omega-regular secrets under finite-memory schedulers
    corecore