3,042 research outputs found
On Dynamic Programming Decompositions of Static Risk Measures in Markov Decision Processes
Optimizing static risk-averse objectives in Markov decision processes is
difficult because they do not admit standard dynamic programming equations
common in Reinforcement Learning (RL) algorithms. Dynamic programming
decompositions that augment the state space with discrete risk levels have
recently gained popularity in the RL community. Prior work has shown that these
decompositions are optimal when the risk level is discretized sufficiently.
However, we show that these popular decompositions for
Conditional-Value-at-Risk (CVaR) and Entropic-Value-at-Risk (EVaR) are
inherently suboptimal regardless of the discretization level. In particular, we
show that a saddle point property assumed to hold in prior literature may be
violated. However, a decomposition does hold for Value-at-Risk and our proof
demonstrates how this risk measure differs from CVaR and EVaR. Our findings are
significant because risk-averse algorithms are used in high-stake environments,
making their correctness much more critical
Constrained Risk-Averse Markov Decision Processes
We consider the problem of designing policies for Markov decision processes (MDPs) with dynamic coherent risk objectives and constraints. We begin by formulating the problem in a Lagrangian framework. Under the assumption that the risk objectives and constraints can be represented by a Markov risk transition mapping, we propose an optimization-based method to synthesize Markovian policies that lower-bound the constrained risk-averse problem. We demonstrate that the formulated optimization problems are in the form of difference convex programs (DCPs) and can be solved by the disciplined convex-concave programming (DCCP) framework. We show that these results generalize linear programs for constrained MDPs with total discounted expected costs and constraints. Finally, we illustrate the effectiveness of the proposed method with numerical experiments on a rover navigation problem involving conditional-value-at-risk (CVaR) and entropic-value-at-risk (EVaR) coherent risk measures
Risk Aversion in Finite Markov Decision Processes Using Total Cost Criteria and Average Value at Risk
In this paper we present an algorithm to compute risk averse policies in
Markov Decision Processes (MDP) when the total cost criterion is used together
with the average value at risk (AVaR) metric. Risk averse policies are needed
when large deviations from the expected behavior may have detrimental effects,
and conventional MDP algorithms usually ignore this aspect. We provide
conditions for the structure of the underlying MDP ensuring that approximations
for the exact problem can be derived and solved efficiently. Our findings are
novel inasmuch as average value at risk has not previously been considered in
association with the total cost criterion. Our method is demonstrated in a
rapid deployment scenario, whereby a robot is tasked with the objective of
reaching a target location within a temporal deadline where increased speed is
associated with increased probability of failure. We demonstrate that the
proposed algorithm not only produces a risk averse policy reducing the
probability of exceeding the expected temporal deadline, but also provides the
statistical distribution of costs, thus offering a valuable analysis tool
- …