324 research outputs found

    Multiple-Environment Markov Decision Processes

    Get PDF
    We introduce Multi-Environment Markov Decision Processes (MEMDPs) which are MDPs with a set of probabilistic transition functions. The goal in a MEMDP is to synthesize a single controller with guaranteed performances against all environments even though the environment is unknown a priori. While MEMDPs can be seen as a special class of partially observable MDPs, we show that several verification problems that are undecidable for partially observable MDPs, are decidable for MEMDPs and sometimes have even efficient solutions

    Quantitative games with interval objectives

    Get PDF
    Traditionally quantitative games such as mean-payoff games and discount sum games have two players -- one trying to maximize the payoff, the other trying to minimize it. The associated decision problem, "Can Eve (the maximizer) achieve, for example, a positive payoff?" can be thought of as one player trying to attain a payoff in the interval (0,∞)(0,\infty). In this paper we consider the more general problem of determining if a player can attain a payoff in a finite union of arbitrary intervals for various payoff functions (liminf, mean-payoff, discount sum, total sum). In particular this includes the interesting exact-value problem, "Can Eve achieve a payoff of exactly (e.g.) 0?"Comment: Full version of CONCUR submissio

    On the complexity of heterogeneous multidimensional quantitative games

    Get PDF
    In this paper, we study two-player zero-sum turn-based games played on a finite multidimensional weighted graph. In recent papers all dimensions use the same measure, whereas here we allow to combine different measures. Such heterogeneous multidimensional quantitative games provide a general and natural model for the study of reactive system synthesis. We focus on classical measures like the Inf, Sup, LimInf, and LimSup of the weights seen along the play, as well as on the window mean-payoff (WMP) measure. This new measure is a natural strengthening of the mean-payoff measure. We allow objectives defined as Boolean combinations of heterogeneous constraints. While multidimensional games with Boolean combinations of mean-payoff constraints are undecidable, we show that the problem becomes EXPTIME-complete for DNF/CNF Boolean combinations of heterogeneous measures taken among {WMP, Inf, Sup, LimInf, LimSup} and that exponential memory strategies are sufficient for both players to win. We provide a detailed study of the complexity and the memory requirements when the Boolean combination of the measures is replaced by an intersection. EXPTIME-completeness and exponential memory strategies still hold for the intersection of measures in {WMP, Inf, Sup, LimInf, LimSup}, and we get PSPACE-completeness when WMP measure is no longer considered. To avoid EXPTIME-or PSPACE-hardness, we impose at most one occurrence of WMP measure and fix the number of Sup measures, and we propose several refinements (on the number of occurrences of the other measures) for which we get polynomial algorithms and lower memory requirements. For all the considered classes of games, we also study parameterized complexity

    Assume-Admissible Synthesis

    Get PDF
    In this paper, we introduce a novel rule for synthesis of reactive systems, applicable to systems made of n components which have each their own objectives. It is based on the notion of admissible strategies. We compare our novel rule with previous rules defined in the literature, and we show that contrary to the previous proposals, our rule defines sets of solutions which are rectangular. This property leads to solutions which are robust and resilient. We provide algorithms with optimal complexity and also an abstraction framework.Comment: 31 page

    Expected Window Mean-Payoff

    Full text link
    In the window mean-payoff objective, given an infinite path, instead of considering a long run average, we consider the minimum payoff that can be ensured at every position of the path over a finite window that slides over the entire path. Chatterjee et al. studied the problem to decide if in a two-player game, Player 1 has a strategy to ensure a window mean-payoff of at least 0. In this work, we consider a function that given a path returns the supremum value of the window mean-payoff that can be ensured over the path and we show how to compute its expected value in Markov chains and Markov decision processes. We consider two variants of the function: Fixed window mean-payoff in which a fixed window length lmaxl_{max} is provided; and Bounded window mean-payoff in which we compute the maximum possible value of the window mean-payoff over all possible window lengths. Further, for both variants, we consider (i) a direct version of the problem where for each path, the payoff that can be ensured from its very beginning and (ii) a non-direct version that is the prefix independent counterpart of the direct version of the problem.Comment: Replaced PP-hardness of direct fixed window objective with PSPACE-hardness, added alternative definition of window mean-payof

    Symblicit algorithms for optimal strategy synthesis in monotonic Markov decision processes

    Full text link
    When treating Markov decision processes (MDPs) with large state spaces, using explicit representations quickly becomes unfeasible. Lately, Wimmer et al. have proposed a so-called symblicit algorithm for the synthesis of optimal strategies in MDPs, in the quantitative setting of expected mean-payoff. This algorithm, based on the strategy iteration algorithm of Howard and Veinott, efficiently combines symbolic and explicit data structures, and uses binary decision diagrams as symbolic representation. The aim of this paper is to show that the new data structure of pseudo-antichains (an extension of antichains) provides another interesting alternative, especially for the class of monotonic MDPs. We design efficient pseudo-antichain based symblicit algorithms (with open source implementations) for two quantitative settings: the expected mean-payoff and the stochastic shortest path. For two practical applications coming from automated planning and LTL synthesis, we report promising experimental results w.r.t. both the run time and the memory consumption.Comment: In Proceedings SYNT 2014, arXiv:1407.493
    • …
    corecore