1,097 research outputs found
Markov Decision Processes with Risk-Sensitive Criteria: An Overview
The paper provides an overview of the theory and applications of
risk-sensitive Markov decision processes. The term 'risk-sensitive' refers here
to the use of the Optimized Certainty Equivalent as a means to measure
expectation and risk. This comprises the well-known entropic risk measure and
Conditional Value-at-Risk. We restrict our considerations to stationary
problems with an infinite time horizon. Conditions are given under which
optimal policies exist and solution procedures are explained. We present both
the theory when the Optimized Certainty Equivalent is applied recursively as
well as the case where it is applied to the cumulated reward. Discounted as
well as non-discounted models are reviewe
An optimality system for finite average Markov decision chains under risk-aversion
summary:This work concerns controlled Markov chains with finite state space and compact action sets. The decision maker is risk-averse with constant risk-sensitivity, and the performance of a control policy is measured by the long-run average cost criterion. Under standard continuity-compactness conditions, it is shown that the (possibly non-constant) optimal value function is characterized by a system of optimality equations which allows to obtain an optimal stationary policy. Also, it is shown that the optimal superior and inferior limit average cost functions coincide
Continuous-time Markov decision processes under the risk-sensitive average cost criterion
This paper studies continuous-time Markov decision processes under the
risk-sensitive average cost criterion. The state space is a finite set, the
action space is a Borel space, the cost and transition rates are bounded, and
the risk-sensitivity coefficient can take arbitrary positive real numbers.
Under the mild conditions, we develop a new approach to establish the existence
of a solution to the risk-sensitive average cost optimality equation and obtain
the existence of an optimal deterministic stationary policy.Comment: 14 page
Algorithms for CVaR Optimization in MDPs
In many sequential decision-making problems we may want to manage risk by
minimizing some measure of variability in costs in addition to minimizing a
standard criterion. Conditional value-at-risk (CVaR) is a relatively new risk
measure that addresses some of the shortcomings of the well-known
variance-related risk measures, and because of its computational efficiencies
has gained popularity in finance and operations research. In this paper, we
consider the mean-CVaR optimization problem in MDPs. We first derive a formula
for computing the gradient of this risk-sensitive objective function. We then
devise policy gradient and actor-critic algorithms that each uses a specific
method to estimate this gradient and updates the policy parameters in the
descent direction. We establish the convergence of our algorithms to locally
risk-sensitive optimal policies. Finally, we demonstrate the usefulness of our
algorithms in an optimal stopping problem.Comment: Submitted to NIPS 1
On the Convergence of Modified Policy Iteration in Risk Sensitive Exponential Cost Markov Decision Processes
Modified policy iteration (MPI) is a dynamic programming algorithm that
combines elements of policy iteration and value iteration. The convergence of
MPI has been well studied in the context of discounted and average-cost MDPs.
In this work, we consider the exponential cost risk-sensitive MDP formulation,
which is known to provide some robustness to model parameters. Although policy
iteration and value iteration have been well studied in the context of risk
sensitive MDPs, MPI is unexplored. We provide the first proof that MPI also
converges for the risk-sensitive problem in the case of finite state and action
spaces. Since the exponential cost formulation deals with the multiplicative
Bellman equation, our main contribution is a convergence proof which is quite
different than existing results for discounted and risk-neutral average-cost
problems as well as risk sensitive value and policy iteration approaches. We
conclude our analysis with simulation results, assessing MPI's performance
relative to alternative dynamic programming methods like value iteration and
policy iteration across diverse problem parameters. Our findings highlight
risk-sensitive MPI's enhanced computational efficiency compared to both value
and policy iteration techniques.Comment: 25 pages, 3 figures, Under review at Operations Researc
- …