23,195 research outputs found

    Risk Aversion in Finite Markov Decision Processes Using Total Cost Criteria and Average Value at Risk

    Full text link
    In this paper we present an algorithm to compute risk averse policies in Markov Decision Processes (MDP) when the total cost criterion is used together with the average value at risk (AVaR) metric. Risk averse policies are needed when large deviations from the expected behavior may have detrimental effects, and conventional MDP algorithms usually ignore this aspect. We provide conditions for the structure of the underlying MDP ensuring that approximations for the exact problem can be derived and solved efficiently. Our findings are novel inasmuch as average value at risk has not previously been considered in association with the total cost criterion. Our method is demonstrated in a rapid deployment scenario, whereby a robot is tasked with the objective of reaching a target location within a temporal deadline where increased speed is associated with increased probability of failure. We demonstrate that the proposed algorithm not only produces a risk averse policy reducing the probability of exceeding the expected temporal deadline, but also provides the statistical distribution of costs, thus offering a valuable analysis tool

    A Kernel Perspective on Behavioural Metrics for Markov Decision Processes

    Full text link
    Behavioural metrics have been shown to be an effective mechanism for constructing representations in reinforcement learning. We present a novel perspective on behavioural metrics for Markov decision processes via the use of positive definite kernels. We leverage this new perspective to define a new metric that is provably equivalent to the recently introduced MICo distance (Castro et al., 2021). The kernel perspective further enables us to provide new theoretical results, which has so far eluded prior work. These include bounding value function differences by means of our metric, and the demonstration that our metric can be provably embedded into a finite-dimensional Euclidean space with low distortion error. These are two crucial properties when using behavioural metrics for reinforcement learning representations. We complement our theory with strong empirical results that demonstrate the effectiveness of these methods in practice.Comment: Published in TML

    Computing Distances between Probabilistic Automata

    Full text link
    We present relaxed notions of simulation and bisimulation on Probabilistic Automata (PA), that allow some error epsilon. When epsilon is zero we retrieve the usual notions of bisimulation and simulation on PAs. We give logical characterisations of these notions by choosing suitable logics which differ from the elementary ones, L with negation and L without negation, by the modal operator. Using flow networks, we show how to compute the relations in PTIME. This allows the definition of an efficiently computable non-discounted distance between the states of a PA. A natural modification of this distance is introduced, to obtain a discounted distance, which weakens the influence of long term transitions. We compare our notions of distance to others previously defined and illustrate our approach on various examples. We also show that our distance is not expansive with respect to process algebra operators. Although L without negation is a suitable logic to characterise epsilon-(bi)simulation on deterministic PAs, it is not for general PAs; interestingly, we prove that it does characterise weaker notions, called a priori epsilon-(bi)simulation, which we prove to be NP-difficult to decide.Comment: In Proceedings QAPL 2011, arXiv:1107.074

    Linear Distances between Markov Chains

    Get PDF
    We introduce a general class of distances (metrics) between Markov chains, which are based on linear behaviour. This class encompasses distances given topologically (such as the total variation distance or trace distance) as well as by temporal logics or automata. We investigate which of the distances can be approximated by observing the systems, i.e. by black-box testing or simulation, and we provide both negative and positive results

    Game Refinement Relations and Metrics

    Get PDF
    We consider two-player games played over finite state spaces for an infinite number of rounds. At each state, the players simultaneously choose moves; the moves determine a successor state. It is often advantageous for players to choose probability distributions over moves, rather than single moves. Given a goal, for example, reach a target state, the question of winning is thus a probabilistic one: what is the maximal probability of winning from a given state? On these game structures, two fundamental notions are those of equivalences and metrics. Given a set of winning conditions, two states are equivalent if the players can win the same games with the same probability from both states. Metrics provide a bound on the difference in the probabilities of winning across states, capturing a quantitative notion of state similarity. We introduce equivalences and metrics for two-player game structures, and we show that they characterize the difference in probability of winning games whose goals are expressed in the quantitative mu-calculus. The quantitative mu-calculus can express a large set of goals, including reachability, safety, and omega-regular properties. Thus, we claim that our relations and metrics provide the canonical extensions to games, of the classical notion of bisimulation for transition systems. We develop our results both for equivalences and metrics, which generalize bisimulation, and for asymmetrical versions, which generalize simulation
    • …
    corecore