25 research outputs found
Risk-sensitive optimal control for Markov decision processes with monotone cost
The existence of an optimal feedback law is established for the risk-sensitive optimal control problem with denumerable state space. The main assumptions imposed are irreducibility and anear monotonicity condition on the one-step cost function. A solution can be found constructively using either value iteration or policy iteration under suitable conditions on initial feedback law
Asynchronous Gossip for Averaging and Spectral Ranking
We consider two variants of the classical gossip algorithm. The first variant
is a version of asynchronous stochastic approximation. We highlight a
fundamental difficulty associated with the classical asynchronous gossip
scheme, viz., that it may not converge to a desired average, and suggest an
alternative scheme based on reinforcement learning that has guaranteed
convergence to the desired average. We then discuss a potential application to
a wireless network setting with simultaneous link activation constraints. The
second variant is a gossip algorithm for distributed computation of the
Perron-Frobenius eigenvector of a nonnegative matrix. While the first variant
draws upon a reinforcement learning algorithm for an average cost controlled
Markov decision problem, the second variant draws upon a reinforcement learning
algorithm for risk-sensitive control. We then discuss potential applications of
the second variant to ranking schemes, reputation networks, and principal
component analysis.Comment: 14 pages, 7 figures. Minor revisio
On the Convergence of Modified Policy Iteration in Risk Sensitive Exponential Cost Markov Decision Processes
Modified policy iteration (MPI) is a dynamic programming algorithm that
combines elements of policy iteration and value iteration. The convergence of
MPI has been well studied in the context of discounted and average-cost MDPs.
In this work, we consider the exponential cost risk-sensitive MDP formulation,
which is known to provide some robustness to model parameters. Although policy
iteration and value iteration have been well studied in the context of risk
sensitive MDPs, MPI is unexplored. We provide the first proof that MPI also
converges for the risk-sensitive problem in the case of finite state and action
spaces. Since the exponential cost formulation deals with the multiplicative
Bellman equation, our main contribution is a convergence proof which is quite
different than existing results for discounted and risk-neutral average-cost
problems as well as risk sensitive value and policy iteration approaches. We
conclude our analysis with simulation results, assessing MPI's performance
relative to alternative dynamic programming methods like value iteration and
policy iteration across diverse problem parameters. Our findings highlight
risk-sensitive MPI's enhanced computational efficiency compared to both value
and policy iteration techniques.Comment: 25 pages, 3 figures, Under review at Operations Researc