Search CORE

1,097 research outputs found

Markov Decision Processes with Risk-Sensitive Criteria: An Overview

Author: Bäuerle Nicole
Jaśkiewicz Anna
Publication venue
Publication date: 12/11/2023
Field of study

The paper provides an overview of the theory and applications of risk-sensitive Markov decision processes. The term 'risk-sensitive' refers here to the use of the Optimized Certainty Equivalent as a means to measure expectation and risk. This comprises the well-known entropic risk measure and Conditional Value-at-Risk. We restrict our considerations to stationary problems with an infinite time horizon. Conditions are given under which optimal policies exist and solution procedures are explained. We present both the theory when the Optimized Certainty Equivalent is applied recursively as well as the case where it is applied to the cumulated reward. Discounted as well as non-discounted models are reviewe

arXiv.org e-Print Archive

An optimality system for finite average Markov decision chains under risk-aversion

Author: Alanís-Durán Alfredo
Cavazos-Cadena Rolando
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 01/01/2012
Field of study

summary:This work concerns controlled Markov chains with finite state space and compact action sets. The decision maker is risk-averse with constant risk-sensitivity, and the performance of a control policy is measured by the long-run average cost criterion. Under standard continuity-compactness conditions, it is shown that the (possibly non-constant) optimal value function is characterized by a system of optimality equations which allows to obtain an optimal stationary policy. Also, it is shown that the optimal superior and inferior limit average cost functions coincide

Institute of Mathematics AS CR, v. v. i.

Continuous-time Markov decision processes under the risk-sensitive average cost criterion

Author: Chen Xian
Wei Qingda
Publication venue
Publication date: 21/12/2015
Field of study

This paper studies continuous-time Markov decision processes under the risk-sensitive average cost criterion. The state space is a finite set, the action space is a Borel space, the cost and transition rates are bounded, and the risk-sensitivity coefficient can take arbitrary positive real numbers. Under the mild conditions, we develop a new approach to establish the existence of a solution to the risk-sensitive average cost optimality equation and obtain the existence of an optimal deterministic stationary policy.Comment: 14 page

arXiv.org e-Print Archive

Algorithms for CVaR Optimization in MDPs

Author: Chow Yinlam
Ghavamzadeh Mohammad
Publication venue
Publication date: 10/07/2014
Field of study

In many sequential decision-making problems we may want to manage risk by minimizing some measure of variability in costs in addition to minimizing a standard criterion. Conditional value-at-risk (CVaR) is a relatively new risk measure that addresses some of the shortcomings of the well-known variance-related risk measures, and because of its computational efficiencies has gained popularity in finance and operations research. In this paper, we consider the mean-CVaR optimization problem in MDPs. We first derive a formula for computing the gradient of this risk-sensitive objective function. We then devise policy gradient and actor-critic algorithms that each uses a specific method to estimate this gradient and updates the policy parameters in the descent direction. We establish the convergence of our algorithms to locally risk-sensitive optimal policies. Finally, we demonstrate the usefulness of our algorithms in an optimal stopping problem.Comment: Submitted to NIPS 1

arXiv.org e-Print Archive

CiteSeerX

On the Convergence of Modified Policy Iteration in Risk Sensitive Exponential Cost Markov Decision Processes

Author: Moharrami Mehrdad
Murthy Yashaswini
Srikant R.
Publication venue
Publication date: 15/02/2024
Field of study

Modified policy iteration (MPI) is a dynamic programming algorithm that combines elements of policy iteration and value iteration. The convergence of MPI has been well studied in the context of discounted and average-cost MDPs. In this work, we consider the exponential cost risk-sensitive MDP formulation, which is known to provide some robustness to model parameters. Although policy iteration and value iteration have been well studied in the context of risk sensitive MDPs, MPI is unexplored. We provide the first proof that MPI also converges for the risk-sensitive problem in the case of finite state and action spaces. Since the exponential cost formulation deals with the multiplicative Bellman equation, our main contribution is a convergence proof which is quite different than existing results for discounted and risk-neutral average-cost problems as well as risk sensitive value and policy iteration approaches. We conclude our analysis with simulation results, assessing MPI's performance relative to alternative dynamic programming methods like value iteration and policy iteration across diverse problem parameters. Our findings highlight risk-sensitive MPI's enhanced computational efficiency compared to both value and policy iteration techniques.Comment: 25 pages, 3 figures, Under review at Operations Researc

arXiv.org e-Print Archive