14,978 research outputs found

    The Complexity of Graph-Based Reductions for Reachability in Markov Decision Processes

    Full text link
    We study the never-worse relation (NWR) for Markov decision processes with an infinite-horizon reachability objective. A state q is never worse than a state p if the maximal probability of reaching the target set of states from p is at most the same value from q, regard- less of the probabilities labelling the transitions. Extremal-probability states, end components, and essential states are all special cases of the equivalence relation induced by the NWR. Using the NWR, states in the same equivalence class can be collapsed. Then, actions leading to sub- optimal states can be removed. We show the natural decision problem associated to computing the NWR is coNP-complete. Finally, we ex- tend a previously known incomplete polynomial-time iterative algorithm to under-approximate the NWR

    Compositional Performance Modelling with the TIPPtool

    Get PDF
    Stochastic process algebras have been proposed as compositional specification formalisms for performance models. In this paper, we describe a tool which aims at realising all beneficial aspects of compositional performance modelling, the TIPPtool. It incorporates methods for compositional specification as well as solution, based on state-of-the-art techniques, and wrapped in a user-friendly graphical front end. Apart from highlighting the general benefits of the tool, we also discuss some lessons learned during development and application of the TIPPtool. A non-trivial model of a real life communication system serves as a case study to illustrate benefits and limitations

    Controllability Metrics on Networks with Linear Decision Process-type Interactions and Multiplicative Noise

    Full text link
    This paper aims at the study of controllability properties and induced controllability metrics on complex networks governed by a class of (discrete time) linear decision processes with mul-tiplicative noise. The dynamics are given by a couple consisting of a Markov trend and a linear decision process for which both the "deterministic" and the noise components rely on trend-dependent matrices. We discuss approximate, approximate null and exact null-controllability. Several examples are given to illustrate the links between these concepts and to compare our results with their continuous-time counterpart (given in [16]). We introduce a class of backward stochastic Riccati difference schemes (BSRDS) and study their solvability for particular frameworks. These BSRDS allow one to introduce Gramian-like controllability metrics. As application of these metrics, we propose a minimal intervention-targeted reduction in the study of gene networks

    Game Refinement Relations and Metrics

    Get PDF
    We consider two-player games played over finite state spaces for an infinite number of rounds. At each state, the players simultaneously choose moves; the moves determine a successor state. It is often advantageous for players to choose probability distributions over moves, rather than single moves. Given a goal, for example, reach a target state, the question of winning is thus a probabilistic one: what is the maximal probability of winning from a given state? On these game structures, two fundamental notions are those of equivalences and metrics. Given a set of winning conditions, two states are equivalent if the players can win the same games with the same probability from both states. Metrics provide a bound on the difference in the probabilities of winning across states, capturing a quantitative notion of state similarity. We introduce equivalences and metrics for two-player game structures, and we show that they characterize the difference in probability of winning games whose goals are expressed in the quantitative mu-calculus. The quantitative mu-calculus can express a large set of goals, including reachability, safety, and omega-regular properties. Thus, we claim that our relations and metrics provide the canonical extensions to games, of the classical notion of bisimulation for transition systems. We develop our results both for equivalences and metrics, which generalize bisimulation, and for asymmetrical versions, which generalize simulation

    Computing Distances between Probabilistic Automata

    Full text link
    We present relaxed notions of simulation and bisimulation on Probabilistic Automata (PA), that allow some error epsilon. When epsilon is zero we retrieve the usual notions of bisimulation and simulation on PAs. We give logical characterisations of these notions by choosing suitable logics which differ from the elementary ones, L with negation and L without negation, by the modal operator. Using flow networks, we show how to compute the relations in PTIME. This allows the definition of an efficiently computable non-discounted distance between the states of a PA. A natural modification of this distance is introduced, to obtain a discounted distance, which weakens the influence of long term transitions. We compare our notions of distance to others previously defined and illustrate our approach on various examples. We also show that our distance is not expansive with respect to process algebra operators. Although L without negation is a suitable logic to characterise epsilon-(bi)simulation on deterministic PAs, it is not for general PAs; interestingly, we prove that it does characterise weaker notions, called a priori epsilon-(bi)simulation, which we prove to be NP-difficult to decide.Comment: In Proceedings QAPL 2011, arXiv:1107.074

    Bayesian Reinforcement Learning via Deep, Sparse Sampling

    Full text link
    We address the problem of Bayesian reinforcement learning using efficient model-based online planning. We propose an optimism-free Bayes-adaptive algorithm to induce deeper and sparser exploration with a theoretical bound on its performance relative to the Bayes optimal policy, with a lower computational complexity. The main novelty is the use of a candidate policy generator, to generate long-term options in the planning tree (over beliefs), which allows us to create much sparser and deeper trees. Experimental results on different environments show that in comparison to the state-of-the-art, our algorithm is both computationally more efficient, and obtains significantly higher reward in discrete environments.Comment: Published in AISTATS 202
    • 

    corecore