11 research outputs found

    Parity Objectives in Countable MDPs

    Get PDF
    We study countably infinite MDPs with parity objectives, and special cases with a bounded number of colors in the Mostowski hierarchy (including reachability, safety, BĂŒchi and co-BĂŒchi). In finite MDPs there always exist optimal memoryless deterministic (MD) strategies for parity objectives, but this does not generally hold for countably infinite MDPs. In particular, optimal strategies need not exist. For countable infinite MDPs, we provide a complete picture of the memory requirements of optimal (resp., c-optimal) strategies for all objectives in the Mostowski hierarchy. In particular, there is a strong dichotomy between two different types of objectives. For the first type, optimal strategies, if they exist, can be chosen MD, while for the second type optimal strategies require infinite memory. (I.e., for all objectives in the Mostowski hierarchy, if finite-memory randomized strategies suffice then also MD-strategies suffice.) Similarly, some objectives admit c-optimal MD-strategies, while for others c-optimal strategies require infinite memory. Such a dichotomy also holds for the subclass of countably infinite MDPs that are finitely branching, though more objectives admit MD-strategies here

    Strategy Complexity of Parity Objectives in Countable MDPs

    Get PDF
    We study countably infinite MDPs with parity objectives. Unlike in finite MDPs, optimal strategies need not exist, and may require infinite memory if they do. We provide a complete picture of the exact strategy complexity of Δ\varepsilon-optimal strategies (and optimal strategies, where they exist) for all subclasses of parity objectives in the Mostowski hierarchy. Either MD-strategies, Markov strategies, or 1-bit Markov strategies are necessary and sufficient, depending on the number of colors, the branching degree of the MDP, and whether one considers Δ\varepsilon-optimal or optimal strategies. In particular, 1-bit Markov strategies are necessary and sufficient for Δ\varepsilon-optimal (resp. optimal) strategies for general parity objectives.Comment: This is the full version of a paper presented at CONCUR 202

    Strategy Complexity of Threshold Payoff with Applications to Optimal Expected Payoff

    Full text link
    We study countably infinite Markov decision processes (MDPs) with transition rewards. The lim sup⁥\limsup (resp. lim inf⁥\liminf) threshold objective is to maximize the probability that the lim sup⁥\limsup (resp. lim inf⁥\liminf) of the infinite sequence of transition rewards is non-negative. We establish the complete picture of the strategy complexity of these objectives, i.e., the upper and lower bounds on the memory required by Δ\varepsilon-optimal (resp. optimal) strategies. We then apply these results to solve two open problems from [Sudderth, Decisions in Economics and Finance, 2020] about the strategy complexity of optimal strategies for the expected lim sup⁥\limsup (resp. lim inf⁥\liminf) payoff.Comment: 53 page

    Optimal Strategies in Concurrent Reachability Games

    Get PDF

    Taming denumerable Markov decision processes with decisiveness

    Full text link
    Decisiveness has proven to be an elegant concept for denumerable Markov chains: it is general enough to encompass several natural classes of denumerable Markov chains, and is a sufficient condition for simple qualitative and approximate quantitative model checking algorithms to exist. In this paper, we explore how to extend the notion of decisiveness to Markov decision processes. Compared to Markov chains, the extra non-determinism can be resolved in an adversarial or cooperative way, yielding two natural notions of decisiveness. We then explore whether these notions yield model checking procedures concerning the infimum and supremum probabilities of reachability properties

    Strategy Complexity of Point Payoff, Mean Payoff and Total Payoff Objectives in Countable MDPs

    Get PDF
    We study countably infinite Markov decision processes (MDPs) with real-valued transition rewards. Every infinite run induces the following sequences of payoffs: 1. Point payoff (the sequence of directly seen transition rewards), 2. Mean payoff (the sequence of the sums of all rewards so far, divided by the number of steps), and 3. Total payoff (the sequence of the sums of all rewards so far). For each payoff type, the objective is to maximize the probability that the lim inf⁥\liminf is non-negative. We establish the complete picture of the strategy complexity of these objectives, i.e., how much memory is necessary and sufficient for Δ\varepsilon-optimal (resp. optimal) strategies. Some cases can be won with memoryless deterministic strategies, while others require a step counter, a reward counter, or both.Comment: Revised and extended journal version of results presented at the CONCUR 2021 conference. For a special issue in the arxiv overlay journal LMCS (https://lmcs.episciences.org). This is not a duplicate of arXiv:2107.03287 (the conference version), but the significantly changed journal version for LMCS (which uses arXiv as a backend

    Strategy Complexity of Reachability in Countable Stochastic 2-Player Games

    Full text link
    We study countably infinite stochastic 2-player games with reachability objectives. Our results provide a complete picture of the memory requirements of Δ\varepsilon-optimal (resp. optimal) strategies. These results depend on the size of the players' action sets and on whether one requires strategies that are uniform (i.e., independent of the start state). Our main result is that Δ\varepsilon-optimal (resp. optimal) Maximizer strategies require infinite memory if Minimizer is allowed infinite action sets. This lower bound holds even under very strong restrictions. Even in the special case of infinitely branching turn-based reachability games, even if all states allow an almost surely winning Maximizer strategy, strategies with a step counter plus finite private memory are still useless. Regarding uniformity, we show that for Maximizer there need not exist positional (i.e., memoryless) uniformly Δ\varepsilon-optimal strategies even in the special case of finite action sets or in finitely branching turn-based games. On the other hand, in games with finite action sets, there always exists a uniformly Δ\varepsilon-optimal Maximizer strategy that uses just one bit of public memory

    Strategy Complexity of Point Payoff, Mean Payoff and Total Payoff Objectives in Countable MDPs

    Get PDF
    We study countably infinite Markov decision processes (MDPs) with real-valued transition rewards. Every infinite run induces the following sequences of payoffs: 1. Point payoff (the sequence of directly seen transition rewards), 2. Mean payoff (the sequence of the sums of all rewards so far, divided by the number of steps), and 3. Total payoff (the sequence of the sums of all rewards so far). For each payoff type, the objective is to maximize the probability that the lim inf⁥\liminf is non-negative. We establish the complete picture of the strategy complexity of these objectives, i.e., how much memory is necessary and sufficient for Δ\varepsilon-optimal (resp. optimal) strategies. Some cases can be won with memoryless deterministic strategies, while others require a step counter, a reward counter, or both

    Strategy complexity of lim sup and lim inf payoff objectives in countable MDPs

    Get PDF
    We study countably infinite Markov decision processes (MDPs) with real-valued tran- sition rewards. A strategy is a function which decides how plays proceed within the MDP. Every strategy induces a set of infinite runs in the MDP and each infinite run induces the following sequences of payoffs: 1. Point payoff (the sequence of directly seen transition rewards), 2. Mean payoff (the sequence of the sums of all rewards so far, divided by the number of steps), and 3. Total payoff (the sequence of the sums of all rewards so far). For each payoff type, the threshold objective is to maximise the probability that the lim sup/lim inf is non-negative. We are interested in the strategy complexity of the above objectives, i.e. the amount of memory and/or randomisation that a strategy needs access to in order to play well (optimally resp. Δ-optimally). Our results seek not only to decide whether an objective requires finite or infinite memory, but in the case of infinite memory, what kind of infinite memory is necessary and sufficient. For example, a step counter which acts as a clock, or a reward counter which sums up the seen rewards may be sufficient. We compare the lim sup/lim inf point payoff objectives to the BĂŒchi/co-BĂŒchi ob- jectives which, given a set of states or transitions, seek to maximise the probability that this set is visited infinitely/finitely often. Convergence effects are what differen- tiate lim sup/lim inf point payoff objectives from BĂŒchi/co-BĂŒchi. For example, the sequence −1/2, −1/3, −1/4 . . . does satisfy lim sup ≄ 0 and lim inf ≄ 0 despite all of the rewards being negative. It is in dealing with these effects which we make our main technical contributions. We establish a complete picture of the strategy complexity for both the lim sup and lim inf point payoff objectives. In particular we show that opti- mal lim sup requires either randomisation or access to a step counter and that lim inf of point payoff requires a step counter (but not more) when the underlying MDP is infinitely branching. We also comprehensively pin down the strategy complexity for the lim inf total and mean payoff objectives. This result requires a novel counterexample involving unboundedly growing rewards as well as finely tuned transition probabilities which force the player to use memory in order to mimic what occurred in past random events. This allows us to show that both of these objectives require the use of both a step counter as well as a reward counter. We apply our results to solve two open problems from Sudderth [35] about the strategy complexity of optimal strategies for the expected lim sup/lim inf point pay- off. We achieve this by reducing each objective to its respective optimal threshold lim sup/lim inf point payoff counterpart. Thus we are able to conclude that they share the same optimal strategy complexity

    Parity objectives in countable MDPs

    No full text
    We study countably infinite MDPs with parity objectives, and special cases with a bounded number of colors in the Mostowski hierarchy (including reachability, safety, BĂŒchi and co-BĂŒchi). In finite MDPs there always exist optimal memoryless deterministic (MD) strategies for parity objectives, but this does not generally hold for countably infiniteMDPs. In particular, optimal strategies need not exist. For countable infinite MDPs, we provide a complete picture of the memory requirements of optimal (resp., Ç«-optimal) strategies for all objectives in the Mostowski hierarchy. In particular, there is a strong dichotomy between two different types of objectives. For the first type, optimal strategies, if they exist, can be chosen MD, while for the second type optimal strategies require infinite memory. (I.e., for all objectives in the Mostowski hierarchy, if finite-memory randomized strategies suffice then also MD-strategies suffice.) Similarly, some objectives admit Ç«-optimal MD-strategies, while for others Ç«-optimal strategies require infinite memory. Such a dichotomy also holds for the subclass of countably infinite MDPs that are finitely branching, though more objectives admit MD-strategies here.</p
    corecore