52 research outputs found
Algorithms for CVaR Optimization in MDPs
In many sequential decision-making problems we may want to manage risk by
minimizing some measure of variability in costs in addition to minimizing a
standard criterion. Conditional value-at-risk (CVaR) is a relatively new risk
measure that addresses some of the shortcomings of the well-known
variance-related risk measures, and because of its computational efficiencies
has gained popularity in finance and operations research. In this paper, we
consider the mean-CVaR optimization problem in MDPs. We first derive a formula
for computing the gradient of this risk-sensitive objective function. We then
devise policy gradient and actor-critic algorithms that each uses a specific
method to estimate this gradient and updates the policy parameters in the
descent direction. We establish the convergence of our algorithms to locally
risk-sensitive optimal policies. Finally, we demonstrate the usefulness of our
algorithms in an optimal stopping problem.Comment: Submitted to NIPS 1
Generalized Batch Normalization: Towards Accelerating Deep Neural Networks
Utilizing recently introduced concepts from statistics and quantitative risk
management, we present a general variant of Batch Normalization (BN) that
offers accelerated convergence of Neural Network training compared to
conventional BN. In general, we show that mean and standard deviation are not
always the most appropriate choice for the centering and scaling procedure
within the BN transformation, particularly if ReLU follows the normalization
step. We present a Generalized Batch Normalization (GBN) transformation, which
can utilize a variety of alternative deviation measures for scaling and
statistics for centering, choices which naturally arise from the theory of
generalized deviation measures and risk theory in general. When used in
conjunction with the ReLU non-linearity, the underlying risk theory suggests
natural, arguably optimal choices for the deviation measure and statistic.
Utilizing the suggested deviation measure and statistic, we show experimentally
that training is accelerated more so than with conventional BN, often with
improved error rate as well. Overall, we propose a more flexible BN
transformation supported by a complimentary theoretical framework that can
potentially guide design choices.Comment: accepted at AAAI-1
Risk Aversion in Finite Markov Decision Processes Using Total Cost Criteria and Average Value at Risk
In this paper we present an algorithm to compute risk averse policies in
Markov Decision Processes (MDP) when the total cost criterion is used together
with the average value at risk (AVaR) metric. Risk averse policies are needed
when large deviations from the expected behavior may have detrimental effects,
and conventional MDP algorithms usually ignore this aspect. We provide
conditions for the structure of the underlying MDP ensuring that approximations
for the exact problem can be derived and solved efficiently. Our findings are
novel inasmuch as average value at risk has not previously been considered in
association with the total cost criterion. Our method is demonstrated in a
rapid deployment scenario, whereby a robot is tasked with the objective of
reaching a target location within a temporal deadline where increased speed is
associated with increased probability of failure. We demonstrate that the
proposed algorithm not only produces a risk averse policy reducing the
probability of exceeding the expected temporal deadline, but also provides the
statistical distribution of costs, thus offering a valuable analysis tool
- …