Algorithms for CVaR Optimization in MDPs

Chow, Yinlam; Ghavamzadeh, Mohammad

research

Algorithms for CVaR Optimization in MDPs

Authors: Yinlam Chow
Mohammad Ghavamzadeh
Publication date: 10 July 2014
Publisher

Abstract

In many sequential decision-making problems we may want to manage risk by minimizing some measure of variability in costs in addition to minimizing a standard criterion. Conditional value-at-risk (CVaR) is a relatively new risk measure that addresses some of the shortcomings of the well-known variance-related risk measures, and because of its computational efficiencies has gained popularity in finance and operations research. In this paper, we consider the mean-CVaR optimization problem in MDPs. We first derive a formula for computing the gradient of this risk-sensitive objective function. We then devise policy gradient and actor-critic algorithms that each uses a specific method to estimate this gradient and updates the policy parameters in the descent direction. We establish the convergence of our algorithms to locally risk-sensitive optimal policies. Finally, we demonstrate the usefulness of our algorithms in an optimal stopping problem.Comment: Submitted to NIPS 1

Similar works

Full text

Available Versions

CiteSeerX

oai:CiteSeerX.psu:10.1.1.748.8...

Last time updated on 30/10/2017

CiteSeerX

oai:CiteSeerX.psu:10.1.1.853.9...

Last time updated on 30/10/2017

CiteSeerX

oai:CiteSeerX.psu:10.1.1.842.3...

Last time updated on 30/10/2017