14,978 research outputs found
The Complexity of Graph-Based Reductions for Reachability in Markov Decision Processes
We study the never-worse relation (NWR) for Markov decision processes with an
infinite-horizon reachability objective. A state q is never worse than a state
p if the maximal probability of reaching the target set of states from p is at
most the same value from q, regard- less of the probabilities labelling the
transitions. Extremal-probability states, end components, and essential states
are all special cases of the equivalence relation induced by the NWR. Using the
NWR, states in the same equivalence class can be collapsed. Then, actions
leading to sub- optimal states can be removed. We show the natural decision
problem associated to computing the NWR is coNP-complete. Finally, we ex- tend
a previously known incomplete polynomial-time iterative algorithm to
under-approximate the NWR
Compositional Performance Modelling with the TIPPtool
Stochastic process algebras have been proposed as compositional specification formalisms for performance models. In this paper, we describe a tool which aims at realising all beneficial aspects of compositional performance modelling, the TIPPtool. It incorporates methods for compositional specification as well as solution, based on state-of-the-art techniques, and wrapped in a user-friendly graphical front end. Apart from highlighting the general benefits of the tool, we also discuss some lessons learned during development and application of the TIPPtool. A non-trivial model of a real life communication system serves as a case study to illustrate benefits and limitations
Controllability Metrics on Networks with Linear Decision Process-type Interactions and Multiplicative Noise
This paper aims at the study of controllability properties and induced
controllability metrics on complex networks governed by a class of (discrete
time) linear decision processes with mul-tiplicative noise. The dynamics are
given by a couple consisting of a Markov trend and a linear decision process
for which both the "deterministic" and the noise components rely on
trend-dependent matrices. We discuss approximate, approximate null and exact
null-controllability. Several examples are given to illustrate the links
between these concepts and to compare our results with their continuous-time
counterpart (given in [16]). We introduce a class of backward stochastic
Riccati difference schemes (BSRDS) and study their solvability for particular
frameworks. These BSRDS allow one to introduce Gramian-like controllability
metrics. As application of these metrics, we propose a minimal
intervention-targeted reduction in the study of gene networks
Game Refinement Relations and Metrics
We consider two-player games played over finite state spaces for an infinite
number of rounds. At each state, the players simultaneously choose moves; the
moves determine a successor state. It is often advantageous for players to
choose probability distributions over moves, rather than single moves. Given a
goal, for example, reach a target state, the question of winning is thus a
probabilistic one: what is the maximal probability of winning from a given
state?
On these game structures, two fundamental notions are those of equivalences
and metrics. Given a set of winning conditions, two states are equivalent if
the players can win the same games with the same probability from both states.
Metrics provide a bound on the difference in the probabilities of winning
across states, capturing a quantitative notion of state similarity.
We introduce equivalences and metrics for two-player game structures, and we
show that they characterize the difference in probability of winning games
whose goals are expressed in the quantitative mu-calculus. The quantitative
mu-calculus can express a large set of goals, including reachability, safety,
and omega-regular properties. Thus, we claim that our relations and metrics
provide the canonical extensions to games, of the classical notion of
bisimulation for transition systems. We develop our results both for
equivalences and metrics, which generalize bisimulation, and for asymmetrical
versions, which generalize simulation
Computing Distances between Probabilistic Automata
We present relaxed notions of simulation and bisimulation on Probabilistic
Automata (PA), that allow some error epsilon. When epsilon is zero we retrieve
the usual notions of bisimulation and simulation on PAs. We give logical
characterisations of these notions by choosing suitable logics which differ
from the elementary ones, L with negation and L without negation, by the modal
operator. Using flow networks, we show how to compute the relations in PTIME.
This allows the definition of an efficiently computable non-discounted distance
between the states of a PA. A natural modification of this distance is
introduced, to obtain a discounted distance, which weakens the influence of
long term transitions. We compare our notions of distance to others previously
defined and illustrate our approach on various examples. We also show that our
distance is not expansive with respect to process algebra operators. Although L
without negation is a suitable logic to characterise epsilon-(bi)simulation on
deterministic PAs, it is not for general PAs; interestingly, we prove that it
does characterise weaker notions, called a priori epsilon-(bi)simulation, which
we prove to be NP-difficult to decide.Comment: In Proceedings QAPL 2011, arXiv:1107.074
Bayesian Reinforcement Learning via Deep, Sparse Sampling
We address the problem of Bayesian reinforcement learning using efficient
model-based online planning. We propose an optimism-free Bayes-adaptive
algorithm to induce deeper and sparser exploration with a theoretical bound on
its performance relative to the Bayes optimal policy, with a lower
computational complexity. The main novelty is the use of a candidate policy
generator, to generate long-term options in the planning tree (over beliefs),
which allows us to create much sparser and deeper trees. Experimental results
on different environments show that in comparison to the state-of-the-art, our
algorithm is both computationally more efficient, and obtains significantly
higher reward in discrete environments.Comment: Published in AISTATS 202
- âŠ