13 research outputs found

    Risk Aversion in Finite Markov Decision Processes Using Total Cost Criteria and Average Value at Risk

    Full text link
    In this paper we present an algorithm to compute risk averse policies in Markov Decision Processes (MDP) when the total cost criterion is used together with the average value at risk (AVaR) metric. Risk averse policies are needed when large deviations from the expected behavior may have detrimental effects, and conventional MDP algorithms usually ignore this aspect. We provide conditions for the structure of the underlying MDP ensuring that approximations for the exact problem can be derived and solved efficiently. Our findings are novel inasmuch as average value at risk has not previously been considered in association with the total cost criterion. Our method is demonstrated in a rapid deployment scenario, whereby a robot is tasked with the objective of reaching a target location within a temporal deadline where increased speed is associated with increased probability of failure. We demonstrate that the proposed algorithm not only produces a risk averse policy reducing the probability of exceeding the expected temporal deadline, but also provides the statistical distribution of costs, thus offering a valuable analysis tool

    On gradual-impulse control of continuous-time Markov decision processes with multiplicative cost

    Full text link
    In this paper, we consider the gradual-impulse control problem of continuous-time Markov decision processes, where the system performance is measured by the expectation of the exponential utility of the total cost. We prove, under very general conditions on the system primitives, the existence of a deterministic stationary optimal policy out of a more general class of policies. Policies that we consider allow multiple simultaneous impulses, randomized selection of impulses with random effects, relaxed gradual controls, and accumulation of jumps. After characterizing the value function using the optimality equation, we reduce the continuous-time gradual-impulse control problem to an equivalent simple discrete-time Markov decision process, whose action space is the union of the sets of gradual and impulsive actions

    On Optimizing the Conditional Value-at-Risk of a Maximum Cost for Risk-Averse Safety Analysis

    Full text link
    The popularity of Conditional Value-at-Risk (CVaR), a risk functional from finance, has been growing in the control systems community due to its intuitive interpretation and axiomatic foundation. We consider a non-standard optimal control problem in which the goal is to minimize the CVaR of a maximum random cost subject to a Borel-space Markov decision process. The objective takes the form CVaRα(maxt=0,1,,NCt)\text{CVaR}_{\alpha}(\max_{t=0,1,\dots,N} C_t), where α\alpha is a risk-aversion parameter representing a fraction of worst cases, CtC_t is a stage or terminal cost, and NNN \in \mathbb{N} is the length of a finite discrete-time horizon. The objective represents the maximum departure from a desired operating region averaged over a given fraction α\alpha of worst cases. This problem provides a safety criterion for a stochastic system that is informed by both the probability and severity of the potential consequences of the system's trajectory. In contrast, existing safety analysis frameworks apply stage-wise risk constraints (i.e., ρ(Ct)\rho(C_t) must be small for all tt, where ρ\rho is a risk functional) or assess the probability of constraint violation without quantifying its possible severity. To the best of our knowledge, the problem of interest has not been solved. To solve the problem, we propose and study a family of stochastic dynamic programs on an augmented state space. We prove that the optimal CVaR of a maximum cost enjoys an equivalent representation in terms of the solutions to this family of dynamic programs under appropriate assumptions. We show the existence of an optimal policy that depends on the dynamics of an augmented state under a measurable selection condition. Moreover, we demonstrate how our safety analysis framework is useful for assessing the severity of combined sewer overflows under precipitation uncertainty.Comment: A shorter version is under review for IEEE Transactions on Automatic Control, submitted December 202
    corecore