Search CORE

13 research outputs found

Risk Aversion in Finite Markov Decision Processes Using Total Cost Criteria and Average Value at Risk

Author: Carpin Stefano
Chow Yin-Lam
Pavone Marco
Publication venue
Publication date: 01/01/2016
Field of study

In this paper we present an algorithm to compute risk averse policies in Markov Decision Processes (MDP) when the total cost criterion is used together with the average value at risk (AVaR) metric. Risk averse policies are needed when large deviations from the expected behavior may have detrimental effects, and conventional MDP algorithms usually ignore this aspect. We provide conditions for the structure of the underlying MDP ensuring that approximations for the exact problem can be derived and solved efficiently. Our findings are novel inasmuch as average value at risk has not previously been considered in association with the total cost criterion. Our method is demonstrated in a rapid deployment scenario, whereby a robot is tasked with the objective of reaching a target location within a temporal deadline where increased speed is associated with increased probability of failure. We demonstrate that the proposed algorithm not only produces a risk averse policy reducing the probability of exceeding the expected temporal deadline, but also provides the statistical distribution of costs, thus offering a valuable analysis tool

arXiv.org e-Print Archive

Crossref

eScholarship - University of California

On gradual-impulse control of continuous-time Markov decision processes with multiplicative cost

Author: Guo Xin
Kurushima Aiko
Piunovskiy Alexey
Zhang Yi
Publication venue
Publication date: 28/11/2018
Field of study

In this paper, we consider the gradual-impulse control problem of continuous-time Markov decision processes, where the system performance is measured by the expectation of the exponential utility of the total cost. We prove, under very general conditions on the system primitives, the existence of a deterministic stationary optimal policy out of a more general class of policies. Policies that we consider allow multiple simultaneous impulses, randomized selection of impulses with random effects, relaxed gradual controls, and accumulation of jumps. After characterizing the value function using the optimality equation, we reduce the continuous-time gradual-impulse control problem to an equivalent simple discrete-time Markov decision process, whose action space is the union of the sets of gradual and impulsive actions

arXiv.org e-Print Archive

On Optimizing the Conditional Value-at-Risk of a Maximum Cost for Risk-Averse Safety Analysis

Author: Chapman Margaret P.
Fauss Michael
Smith Kevin M.
Publication venue
Publication date: 22/12/2021
Field of study

The popularity of Conditional Value-at-Risk (CVaR), a risk functional from finance, has been growing in the control systems community due to its intuitive interpretation and axiomatic foundation. We consider a non-standard optimal control problem in which the goal is to minimize the CVaR of a maximum random cost subject to a Borel-space Markov decision process. The objective takes the form

\text{CVaR}_{\alpha}(\max_{t=0,1,\dots,N} C_t)

, where

\alpha

is a risk-aversion parameter representing a fraction of worst cases,

C_t

is a stage or terminal cost, and

N \in \mathbb{N}

is the length of a finite discrete-time horizon. The objective represents the maximum departure from a desired operating region averaged over a given fraction

\alpha

of worst cases. This problem provides a safety criterion for a stochastic system that is informed by both the probability and severity of the potential consequences of the system's trajectory. In contrast, existing safety analysis frameworks apply stage-wise risk constraints (i.e.,

\rho(C_t)

must be small for all

t

, where

\rho

is a risk functional) or assess the probability of constraint violation without quantifying its possible severity. To the best of our knowledge, the problem of interest has not been solved. To solve the problem, we propose and study a family of stochastic dynamic programs on an augmented state space. We prove that the optimal CVaR of a maximum cost enjoys an equivalent representation in terms of the solutions to this family of dynamic programs under appropriate assumptions. We show the existence of an optimal policy that depends on the dynamics of an augmented state under a measurable selection condition. Moreover, we demonstrate how our safety analysis framework is useful for assessing the severity of combined sewer overflows under precipitation uncertainty.Comment: A shorter version is under review for IEEE Transactions on Automatic Control, submitted December 202

arXiv.org e-Print Archive