Search CORE

23,195 research outputs found

Risk Aversion in Finite Markov Decision Processes Using Total Cost Criteria and Average Value at Risk

Author: Carpin Stefano
Chow Yin-Lam
Pavone Marco
Publication venue
Publication date: 01/01/2016
Field of study

In this paper we present an algorithm to compute risk averse policies in Markov Decision Processes (MDP) when the total cost criterion is used together with the average value at risk (AVaR) metric. Risk averse policies are needed when large deviations from the expected behavior may have detrimental effects, and conventional MDP algorithms usually ignore this aspect. We provide conditions for the structure of the underlying MDP ensuring that approximations for the exact problem can be derived and solved efficiently. Our findings are novel inasmuch as average value at risk has not previously been considered in association with the total cost criterion. Our method is demonstrated in a rapid deployment scenario, whereby a robot is tasked with the objective of reaching a target location within a temporal deadline where increased speed is associated with increased probability of failure. We demonstrate that the proposed algorithm not only produces a risk averse policy reducing the probability of exceeding the expected temporal deadline, but also provides the statistical distribution of costs, thus offering a valuable analysis tool

arXiv.org e-Print Archive

Crossref

eScholarship - University of California

A Kernel Perspective on Behavioural Metrics for Markov Decision Processes

Author: Castro Pablo Samuel
Kastner Tyler
Panangaden Prakash
Rowland Mark
Publication venue
Publication date: 05/10/2023
Field of study

Behavioural metrics have been shown to be an effective mechanism for constructing representations in reinforcement learning. We present a novel perspective on behavioural metrics for Markov decision processes via the use of positive definite kernels. We leverage this new perspective to define a new metric that is provably equivalent to the recently introduced MICo distance (Castro et al., 2021). The kernel perspective further enables us to provide new theoretical results, which has so far eluded prior work. These include bounding value function differences by means of our metric, and the demonstration that our metric can be provably embedded into a finite-dimensional Euclidean space with low distortion error. These are two crucial properties when using behavioural metrics for reinforcement learning representations. We complement our theory with strong empirical results that demonstrate the effectiveness of these methods in practice.Comment: Published in TML

arXiv.org e-Print Archive

Computing Distances between Probabilistic Automata

Author: Abir Zhioua
Alessandro Giacalone
Augusto Parma
Christel Baier
Franck van Breugel
Franck van Breugel
Franck van Breugel
Gethin Norman
Holger Hermanns
Josée Desharnais
Josée Desharnais
Josée Desharnais
Josée Desharnais
Josée Desharnais
K. Chatterjee
Kim G. Larsen
Lijun Zhang
Luca de Alfaro
Martin L. Puterman
Mathieu Tracol
Michael R. Garey
Mieke Massink
Norm Ferns
Norman Ferns
Pedro R. D'Argenio
Roberto Segala
Roberto Segala
Stefano Cattani
Stefano Cattani
Publication venue: 'Open Publishing Association'
Publication date: 01/07/2011
Field of study

We present relaxed notions of simulation and bisimulation on Probabilistic Automata (PA), that allow some error epsilon. When epsilon is zero we retrieve the usual notions of bisimulation and simulation on PAs. We give logical characterisations of these notions by choosing suitable logics which differ from the elementary ones, L with negation and L without negation, by the modal operator. Using flow networks, we show how to compute the relations in PTIME. This allows the definition of an efficiently computable non-discounted distance between the states of a PA. A natural modification of this distance is introduced, to obtain a discounted distance, which weakens the influence of long term transitions. We compare our notions of distance to others previously defined and illustrate our approach on various examples. We also show that our distance is not expansive with respect to process algebra operators. Although L without negation is a suitable logic to characterise epsilon-(bi)simulation on deterministic PAs, it is not for general PAs; interestingly, we prove that it does characterise weaker notions, called a priori epsilon-(bi)simulation, which we prove to be NP-difficult to decide.Comment: In Proceedings QAPL 2011, arXiv:1107.074

arXiv.org e-Print Archive

Crossref

Directory of Open Access Journals

Linear Distances between Markov Chains

Author: Daca Przemysław
Henzinger Thomas A.
Křetínský Jan
Petrov Tatjana
Publication venue
Publication date: 01/01/2016
Field of study

We introduce a general class of distances (metrics) between Markov chains, which are based on linear behaviour. This class encompasses distances given topologically (such as the total variation distance or trace distance) as well as by temporal logics or automata. We investigate which of the distances can be approximated by observing the systems, i.e. by black-box testing or simulation, and we provide both negative and positive results

arXiv.org e-Print Archive

KOPS - The Institutional Repository of the University of Konstanz

Dagstuhl Research Online Publication Server

IST PubRep

IST Austria: PubRep (Institute of Science and Technology)

Game Refinement Relations and Metrics

Author: Adámek J.
de Alfaro Luca
Majumdar Rupak
Pierce B.J.
Plotkin G.J.
Raman Viswanath
Scott D.S.
Stoelinga Mariëlle Ida Antoinette
Vardi M.Y.
Publication venue: 'Logical Methods in Computer Science e.V.'
Publication date: 01/01/2008
Field of study

We consider two-player games played over finite state spaces for an infinite number of rounds. At each state, the players simultaneously choose moves; the moves determine a successor state. It is often advantageous for players to choose probability distributions over moves, rather than single moves. Given a goal, for example, reach a target state, the question of winning is thus a probabilistic one: what is the maximal probability of winning from a given state? On these game structures, two fundamental notions are those of equivalences and metrics. Given a set of winning conditions, two states are equivalent if the players can win the same games with the same probability from both states. Metrics provide a bound on the difference in the probabilities of winning across states, capturing a quantitative notion of state similarity. We introduce equivalences and metrics for two-player game structures, and we show that they characterize the difference in probability of winning games whose goals are expressed in the quantitative mu-calculus. The quantitative mu-calculus can express a large set of goals, including reachability, safety, and omega-regular properties. Thus, we claim that our relations and metrics provide the canonical extensions to games, of the classical notion of bisimulation for transition systems. We develop our results both for equivalences and metrics, which generalize bisimulation, and for asymmetrical versions, which generalize simulation

arXiv.org e-Print Archive

CiteSeerX

Crossref

Episciences.org

Directory of Open Access Journals

University of Twente Research Information