Search CORE

14,978 research outputs found

The Complexity of Graph-Based Reductions for Reachability in Markov Decision Processes

Author: AL Strehl
C Baier
C Courcoubetis
C Dehnert
Krishnendu Chatterjee
L Valiant
LP Kaelbling
M Kwiatkowska
M Steinmetz
ML Puterman
N Fijalkow
PR D’Argenio
S Fortune
SJ Russell
T Brázdil
T Eilam-Tzoreff
Publication venue
Publication date: 01/01/2018
Field of study

We study the never-worse relation (NWR) for Markov decision processes with an infinite-horizon reachability objective. A state q is never worse than a state p if the maximal probability of reaching the target set of states from p is at most the same value from q, regard- less of the probabilities labelling the transitions. Extremal-probability states, end components, and essential states are all special cases of the equivalence relation induced by the NWR. Using the NWR, states in the same equivalence class can be collapsed. Then, actions leading to sub- optimal states can be removed. We show the natural decision problem associated to computing the NWR is coNP-complete. Finally, we ex- tend a previously known incomplete polynomial-time iterative algorithm to under-approximate the NWR

arXiv.org e-Print Archive

Crossref

Institutional Repository Universiteit Antwerpen

DI-fusion

Compositional Performance Modelling with the TIPPtool

Author: C. Baier
H. Hermanns
M. Bozga
N. Götz
P. Kanellakis
R. Milner
R. Paige
T. Bolognesi
U. Herzog
Publication venue
Publication date: 01/01/2000
Field of study

Stochastic process algebras have been proposed as compositional specification formalisms for performance models. In this paper, we describe a tool which aims at realising all beneficial aspects of compositional performance modelling, the TIPPtool. It incorporates methods for compositional specification as well as solution, based on state-of-the-art techniques, and wrapped in a user-friendly graphical front end. Apart from highlighting the general benefits of the tool, we also discuss some lessons learned during development and application of the TIPPtool. A non-trivial model of a real life communication system serves as a case study to illustrate benefits and limitations

CiteSeerX

Crossref

University of Twente Research Information

Controllability Metrics on Networks with Linear Decision Process-type Interactions and Multiplicative Noise

Author: Diallo Tidiane
Goreac Dan
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 12/10/2015
Field of study

This paper aims at the study of controllability properties and induced controllability metrics on complex networks governed by a class of (discrete time) linear decision processes with mul-tiplicative noise. The dynamics are given by a couple consisting of a Markov trend and a linear decision process for which both the "deterministic" and the noise components rely on trend-dependent matrices. We discuss approximate, approximate null and exact null-controllability. Several examples are given to illustrate the links between these concepts and to compare our results with their continuous-time counterpart (given in [16]). We introduce a class of backward stochastic Riccati difference schemes (BSRDS) and study their solvability for particular frameworks. These BSRDS allow one to introduce Gramian-like controllability metrics. As application of these metrics, we propose a minimal intervention-targeted reduction in the study of gene networks

arXiv.org e-Print Archive

HAL - UPEC / UPEM

Game Refinement Relations and Metrics

Author: Adámek J.
de Alfaro Luca
Majumdar Rupak
Pierce B.J.
Plotkin G.J.
Raman Viswanath
Scott D.S.
Stoelinga Mariëlle Ida Antoinette
Vardi M.Y.
Publication venue: 'Logical Methods in Computer Science e.V.'
Publication date: 01/01/2008
Field of study

We consider two-player games played over finite state spaces for an infinite number of rounds. At each state, the players simultaneously choose moves; the moves determine a successor state. It is often advantageous for players to choose probability distributions over moves, rather than single moves. Given a goal, for example, reach a target state, the question of winning is thus a probabilistic one: what is the maximal probability of winning from a given state? On these game structures, two fundamental notions are those of equivalences and metrics. Given a set of winning conditions, two states are equivalent if the players can win the same games with the same probability from both states. Metrics provide a bound on the difference in the probabilities of winning across states, capturing a quantitative notion of state similarity. We introduce equivalences and metrics for two-player game structures, and we show that they characterize the difference in probability of winning games whose goals are expressed in the quantitative mu-calculus. The quantitative mu-calculus can express a large set of goals, including reachability, safety, and omega-regular properties. Thus, we claim that our relations and metrics provide the canonical extensions to games, of the classical notion of bisimulation for transition systems. We develop our results both for equivalences and metrics, which generalize bisimulation, and for asymmetrical versions, which generalize simulation

arXiv.org e-Print Archive

CiteSeerX

Crossref

Episciences.org

Directory of Open Access Journals

University of Twente Research Information

Computing Distances between Probabilistic Automata

Author: Abir Zhioua
Alessandro Giacalone
Augusto Parma
Christel Baier
Franck van Breugel
Franck van Breugel
Franck van Breugel
Gethin Norman
Holger Hermanns
Josée Desharnais
Josée Desharnais
Josée Desharnais
Josée Desharnais
Josée Desharnais
K. Chatterjee
Kim G. Larsen
Lijun Zhang
Luca de Alfaro
Martin L. Puterman
Mathieu Tracol
Michael R. Garey
Mieke Massink
Norm Ferns
Norman Ferns
Pedro R. D'Argenio
Roberto Segala
Roberto Segala
Stefano Cattani
Stefano Cattani
Publication venue: 'Open Publishing Association'
Publication date: 01/07/2011
Field of study

We present relaxed notions of simulation and bisimulation on Probabilistic Automata (PA), that allow some error epsilon. When epsilon is zero we retrieve the usual notions of bisimulation and simulation on PAs. We give logical characterisations of these notions by choosing suitable logics which differ from the elementary ones, L with negation and L without negation, by the modal operator. Using flow networks, we show how to compute the relations in PTIME. This allows the definition of an efficiently computable non-discounted distance between the states of a PA. A natural modification of this distance is introduced, to obtain a discounted distance, which weakens the influence of long term transitions. We compare our notions of distance to others previously defined and illustrate our approach on various examples. We also show that our distance is not expansive with respect to process algebra operators. Although L without negation is a suitable logic to characterise epsilon-(bi)simulation on deterministic PAs, it is not for general PAs; interestingly, we prove that it does characterise weaker notions, called a priori epsilon-(bi)simulation, which we prove to be NP-difficult to decide.Comment: In Proceedings QAPL 2011, arXiv:1107.074

arXiv.org e-Print Archive

Crossref

Directory of Open Access Journals

Bayesian Reinforcement Learning via Deep, Sparse Sampling

Author: Basu Debabrota
Dimitrakakis Christos
Grover Divya
Publication venue
Publication date: 01/01/2020
Field of study

We address the problem of Bayesian reinforcement learning using efficient model-based online planning. We propose an optimism-free Bayes-adaptive algorithm to induce deeper and sparser exploration with a theoretical bound on its performance relative to the Bayes optimal policy, with a lower computational complexity. The main novelty is the use of a candidate policy generator, to generate long-term options in the planning tree (over beliefs), which allows us to create much sparser and deeper trees. Experimental results on different environments show that in comparison to the state-of-the-art, our algorithm is both computationally more efficient, and obtains significantly higher reward in discrete environments.Comment: Published in AISTATS 202

arXiv.org e-Print Archive

Chalmers Research