42,309 research outputs found
Policy Gradients for CVaR-Constrained MDPs
We study a risk-constrained version of the stochastic shortest path (SSP)
problem, where the risk measure considered is Conditional Value-at-Risk (CVaR).
We propose two algorithms that obtain a locally risk-optimal policy by
employing four tools: stochastic approximation, mini batches, policy gradients
and importance sampling. Both the algorithms incorporate a CVaR estimation
procedure, along the lines of Bardou et al. [2009], which in turn is based on
Rockafellar-Uryasev's representation for CVaR and utilize the likelihood ratio
principle for estimating the gradient of the sum of one cost function
(objective of the SSP) and the gradient of the CVaR of the sum of another cost
function (in the constraint of SSP). The algorithms differ in the manner in
which they approximate the CVaR estimates/necessary gradients - the first
algorithm uses stochastic approximation, while the second employ mini-batches
in the spirit of Monte Carlo methods. We establish asymptotic convergence of
both the algorithms. Further, since estimating CVaR is related to rare-event
simulation, we incorporate an importance sampling based variance reduction
scheme into our proposed algorithms
Risk-Sensitive Reinforcement Learning: A Constrained Optimization Viewpoint
The classic objective in a reinforcement learning (RL) problem is to find a
policy that minimizes, in expectation, a long-run objective such as the
infinite-horizon discounted or long-run average cost. In many practical
applications, optimizing the expected value alone is not sufficient, and it may
be necessary to include a risk measure in the optimization process, either as
the objective or as a constraint. Various risk measures have been proposed in
the literature, e.g., mean-variance tradeoff, exponential utility, the
percentile performance, value at risk, conditional value at risk, prospect
theory and its later enhancement, cumulative prospect theory. In this article,
we focus on the combination of risk criteria and reinforcement learning in a
constrained optimization framework, i.e., a setting where the goal to find a
policy that optimizes the usual objective of infinite-horizon
discounted/average cost, while ensuring that an explicit risk constraint is
satisfied. We introduce the risk-constrained RL framework, cover popular risk
measures based on variance, conditional value-at-risk and cumulative prospect
theory, and present a template for a risk-sensitive RL algorithm. We survey
some of our recent work on this topic, covering problems encompassing
discounted cost, average cost, and stochastic shortest path settings, together
with the aforementioned risk measures in a constrained framework. This
non-exhaustive survey is aimed at giving a flavor of the challenges involved in
solving a risk-sensitive RL problem, and outlining some potential future
research directions
- …