2 research outputs found
Policy Evaluation with Variance Related Risk Criteria in Markov Decision Processes
In this paper we extend temporal difference policy evaluation algorithms to
performance criteria that include the variance of the cumulative reward. Such
criteria are useful for risk management, and are important in domains such as
finance and process control. We propose both TD(0) and LSTD(lambda) variants
with linear function approximation, prove their convergence, and demonstrate
their utility in a 4-dimensional continuous state space problem
Variance-Constrained Actor-Critic Algorithms for Discounted and Average Reward MDPs
In many sequential decision-making problems we may want to manage risk by
minimizing some measure of variability in rewards in addition to maximizing a
standard criterion. Variance related risk measures are among the most common
risk-sensitive criteria in finance and operations research. However, optimizing
many such criteria is known to be a hard problem. In this paper, we consider
both discounted and average reward Markov decision processes. For each
formulation, we first define a measure of variability for a policy, which in
turn gives us a set of risk-sensitive criteria to optimize. For each of these
criteria, we derive a formula for computing its gradient. We then devise
actor-critic algorithms that operate on three timescales - a TD critic on the
fastest timescale, a policy gradient (actor) on the intermediate timescale, and
a dual ascent for Lagrange multipliers on the slowest timescale. In the
discounted setting, we point out the difficulty in estimating the gradient of
the variance of the return and incorporate simultaneous perturbation approaches
to alleviate this. The average setting, on the other hand, allows for an actor
update using compatible features to estimate the gradient of the variance. We
establish the convergence of our algorithms to locally risk-sensitive optimal
policies. Finally, we demonstrate the usefulness of our algorithms in a traffic
signal control application