Search CORE

25 research outputs found

Risk-sensitive optimal control for Markov decision processes with monotone cost

Author: Borkar V. S.
Meyn S. P.
Publication venue: 'Institute for Operations Research and the Management Sciences (INFORMS)'
Publication date: 01/02/2002
Field of study

The existence of an optimal feedback law is established for the risk-sensitive optimal control problem with denumerable state space. The main assumptions imposed are irreducibility and anear monotonicity condition on the one-step cost function. A solution can be found constructively using either value iteration or policy iteration under suitable conditions on initial feedback law

Risk-Sensitive Optimal Control for Markov Decision Processes with Monotone Cost

Author: Bellman R.
Cavazos-Cadena R.
Fleming W. H.
S. P. Meyn
V. S. Borkar
Whittle P.
Whittle P.
Publication venue: 'Institute for Operations Research and the Management Sciences (INFORMS)'
Publication date
Field of study

Crossref

Asynchronous Gossip for Averaging and Spectral Ranking

Author: Borkar Vivek S.
Makhijani Rahul
Sundaresan Rajesh
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2014
Field of study

We consider two variants of the classical gossip algorithm. The first variant is a version of asynchronous stochastic approximation. We highlight a fundamental difficulty associated with the classical asynchronous gossip scheme, viz., that it may not converge to a desired average, and suggest an alternative scheme based on reinforcement learning that has guaranteed convergence to the desired average. We then discuss a potential application to a wireless network setting with simultaneous link activation constraints. The second variant is a gossip algorithm for distributed computation of the Perron-Frobenius eigenvector of a nonnegative matrix. While the first variant draws upon a reinforcement learning algorithm for an average cost controlled Markov decision problem, the second variant draws upon a reinforcement learning algorithm for risk-sensitive control. We then discuss potential applications of the second variant to ranking schemes, reputation networks, and principal component analysis.Comment: 14 pages, 7 figures. Minor revisio

arXiv.org e-Print Archive

Open Access Repository of IISc Research Publications

Dspace at IIT Bombay

On the Convergence of Modified Policy Iteration in Risk Sensitive Exponential Cost Markov Decision Processes

Author: Moharrami Mehrdad
Murthy Yashaswini
Srikant R.
Publication venue
Publication date: 15/02/2024
Field of study

Modified policy iteration (MPI) is a dynamic programming algorithm that combines elements of policy iteration and value iteration. The convergence of MPI has been well studied in the context of discounted and average-cost MDPs. In this work, we consider the exponential cost risk-sensitive MDP formulation, which is known to provide some robustness to model parameters. Although policy iteration and value iteration have been well studied in the context of risk sensitive MDPs, MPI is unexplored. We provide the first proof that MPI also converges for the risk-sensitive problem in the case of finite state and action spaces. Since the exponential cost formulation deals with the multiplicative Bellman equation, our main contribution is a convergence proof which is quite different than existing results for discounted and risk-neutral average-cost problems as well as risk sensitive value and policy iteration approaches. We conclude our analysis with simulation results, assessing MPI's performance relative to alternative dynamic programming methods like value iteration and policy iteration across diverse problem parameters. Our findings highlight risk-sensitive MPI's enhanced computational efficiency compared to both value and policy iteration techniques.Comment: 25 pages, 3 figures, Under review at Operations Researc

arXiv.org e-Print Archive