191 research outputs found
A central limit theorem for temporally non-homogenous Markov chains with applications to dynamic programming
We prove a central limit theorem for a class of additive processes that arise
naturally in the theory of finite horizon Markov decision problems. The main
theorem generalizes a classic result of Dobrushin (1956) for temporally
non-homogeneous Markov chains, and the principal innovation is that here the
summands are permitted to depend on both the current state and a bounded number
of future states of the chain. We show through several examples that this added
flexibility gives one a direct path to asymptotic normality of the optimal
total reward of finite horizon Markov decision problems. The same examples also
explain why such results are not easily obtained by alternative Markovian
techniques such as enlargement of the state space.Comment: 27 pages, 1 figur
Nonzero-sum Stochastic Games
This paper treats of stochastic games. We focus on nonzero-sum games and provide a detailed survey of selected recent results. In Section 1, we consider stochastic Markov games. A correlation of strategies of the players, involving ``public signals'', is described, and a correlated equilibrium theorem proved recently by Nowak and Raghavan for discounted stochastic games with general state space is presented. We also report an extension of this result to a class of undiscounted stochastic games, satisfying some uniform ergodicity condition. Stopping games are related to stochastic Markov games. In Section 2, we describe a version of Dynkin's game related to observation of a Markov process with random assignment mechanism of states to the players. Some recent contributions of the second author in this area are reported. The paper also contains a brief overview of the theory of nonzero-sum stochastic games and stopping games which is very far from being complete.average payoff stochastic games, correlated stationary equilibria, nonzero-sum games, stopping time, stopping games
Discrete-Time Control with Non-Constant Discount Factor
This paper deals with discrete-time Markov decision processes (MDPs) with Borel state and action spaces, and total expected discounted cost optimality criterion. We assume that the discount factor is not constant: it may depend on the state and action; moreover, it can even take the extreme values zero or one. We propose sufficient conditions on the data of the model ensuring the existence of optimal control policies and allowing the characterization of the optimal value function as a solution to the dynamic programming equation. As a particular case of these MDPs with varying discount factor, we study MDPs with stopping, as well as the corresponding optimal stopping times and contact set. We show applications to switching MDPs models and, in particular, we study a pollution accumulation problem
A Relative Value Iteration Algorithm for Non-degenerate Controlled Diffusions
The ergodic control problem for a non-degenerate controlled diffusion
controlled through its drift is considered under a uniform stability condition
that ensures the well-posedness of the associated Hamilton-Jacobi-Bellman (HJB)
equation. A nonlinear parabolic evolution equation is then proposed as a
continuous time continuous state space analog of White's `relative value
iteration' algorithm for solving the ergodic dynamic programming equation for
the finite state finite action case. Its convergence to the solution of the HJB
equation is established using the theory of monotone dynamical systems and
also, alternatively, by using the theory of reverse martingales.Comment: 17 page
Discrete-time controlled markov processes with average cost criterion: a survey
This work is a survey of the average cost control problem for discrete-time Markov processes. The authors have attempted to put together a comprehensive account of the considerable research on this problem over the past three decades. The exposition ranges from finite to Borel state and action spaces and includes a variety of methodologies to find and characterize optimal policies. The authors have included a brief historical perspective of the research efforts in this area and have compiled a substantial yet not exhaustive bibliography. The authors have also identified several important questions that are still open to investigation
- …