23,195 research outputs found
Risk Aversion in Finite Markov Decision Processes Using Total Cost Criteria and Average Value at Risk
In this paper we present an algorithm to compute risk averse policies in
Markov Decision Processes (MDP) when the total cost criterion is used together
with the average value at risk (AVaR) metric. Risk averse policies are needed
when large deviations from the expected behavior may have detrimental effects,
and conventional MDP algorithms usually ignore this aspect. We provide
conditions for the structure of the underlying MDP ensuring that approximations
for the exact problem can be derived and solved efficiently. Our findings are
novel inasmuch as average value at risk has not previously been considered in
association with the total cost criterion. Our method is demonstrated in a
rapid deployment scenario, whereby a robot is tasked with the objective of
reaching a target location within a temporal deadline where increased speed is
associated with increased probability of failure. We demonstrate that the
proposed algorithm not only produces a risk averse policy reducing the
probability of exceeding the expected temporal deadline, but also provides the
statistical distribution of costs, thus offering a valuable analysis tool
A Kernel Perspective on Behavioural Metrics for Markov Decision Processes
Behavioural metrics have been shown to be an effective mechanism for
constructing representations in reinforcement learning. We present a novel
perspective on behavioural metrics for Markov decision processes via the use of
positive definite kernels. We leverage this new perspective to define a new
metric that is provably equivalent to the recently introduced MICo distance
(Castro et al., 2021). The kernel perspective further enables us to provide new
theoretical results, which has so far eluded prior work. These include bounding
value function differences by means of our metric, and the demonstration that
our metric can be provably embedded into a finite-dimensional Euclidean space
with low distortion error. These are two crucial properties when using
behavioural metrics for reinforcement learning representations. We complement
our theory with strong empirical results that demonstrate the effectiveness of
these methods in practice.Comment: Published in TML
Computing Distances between Probabilistic Automata
We present relaxed notions of simulation and bisimulation on Probabilistic
Automata (PA), that allow some error epsilon. When epsilon is zero we retrieve
the usual notions of bisimulation and simulation on PAs. We give logical
characterisations of these notions by choosing suitable logics which differ
from the elementary ones, L with negation and L without negation, by the modal
operator. Using flow networks, we show how to compute the relations in PTIME.
This allows the definition of an efficiently computable non-discounted distance
between the states of a PA. A natural modification of this distance is
introduced, to obtain a discounted distance, which weakens the influence of
long term transitions. We compare our notions of distance to others previously
defined and illustrate our approach on various examples. We also show that our
distance is not expansive with respect to process algebra operators. Although L
without negation is a suitable logic to characterise epsilon-(bi)simulation on
deterministic PAs, it is not for general PAs; interestingly, we prove that it
does characterise weaker notions, called a priori epsilon-(bi)simulation, which
we prove to be NP-difficult to decide.Comment: In Proceedings QAPL 2011, arXiv:1107.074
Linear Distances between Markov Chains
We introduce a general class of distances (metrics) between Markov chains,
which are based on linear behaviour. This class encompasses distances given
topologically (such as the total variation distance or trace distance) as well
as by temporal logics or automata. We investigate which of the distances can be
approximated by observing the systems, i.e. by black-box testing or simulation,
and we provide both negative and positive results
Game Refinement Relations and Metrics
We consider two-player games played over finite state spaces for an infinite
number of rounds. At each state, the players simultaneously choose moves; the
moves determine a successor state. It is often advantageous for players to
choose probability distributions over moves, rather than single moves. Given a
goal, for example, reach a target state, the question of winning is thus a
probabilistic one: what is the maximal probability of winning from a given
state?
On these game structures, two fundamental notions are those of equivalences
and metrics. Given a set of winning conditions, two states are equivalent if
the players can win the same games with the same probability from both states.
Metrics provide a bound on the difference in the probabilities of winning
across states, capturing a quantitative notion of state similarity.
We introduce equivalences and metrics for two-player game structures, and we
show that they characterize the difference in probability of winning games
whose goals are expressed in the quantitative mu-calculus. The quantitative
mu-calculus can express a large set of goals, including reachability, safety,
and omega-regular properties. Thus, we claim that our relations and metrics
provide the canonical extensions to games, of the classical notion of
bisimulation for transition systems. We develop our results both for
equivalences and metrics, which generalize bisimulation, and for asymmetrical
versions, which generalize simulation
- …