545 research outputs found
Shape-constrained Estimation of Value Functions
We present a fully nonparametric method to estimate the value function, via
simulation, in the context of expected infinite-horizon discounted rewards for
Markov chains. Estimating such value functions plays an important role in
approximate dynamic programming and applied probability in general. We
incorporate "soft information" into the estimation algorithm, such as knowledge
of convexity, monotonicity, or Lipchitz constants. In the presence of such
information, a nonparametric estimator for the value function can be computed
that is provably consistent as the simulated time horizon tends to infinity. As
an application, we implement our method on price tolling agreement contracts in
energy markets
A central limit theorem for temporally non-homogenous Markov chains with applications to dynamic programming
We prove a central limit theorem for a class of additive processes that arise
naturally in the theory of finite horizon Markov decision problems. The main
theorem generalizes a classic result of Dobrushin (1956) for temporally
non-homogeneous Markov chains, and the principal innovation is that here the
summands are permitted to depend on both the current state and a bounded number
of future states of the chain. We show through several examples that this added
flexibility gives one a direct path to asymptotic normality of the optimal
total reward of finite horizon Markov decision problems. The same examples also
explain why such results are not easily obtained by alternative Markovian
techniques such as enlargement of the state space.Comment: 27 pages, 1 figur
Discrete-time controlled markov processes with average cost criterion: a survey
This work is a survey of the average cost control problem for discrete-time Markov processes. The authors have attempted to put together a comprehensive account of the considerable research on this problem over the past three decades. The exposition ranges from finite to Borel state and action spaces and includes a variety of methodologies to find and characterize optimal policies. The authors have included a brief historical perspective of the research efforts in this area and have compiled a substantial yet not exhaustive bibliography. The authors have also identified several important questions that are still open to investigation
An approximation approach for the deviation matrix of continuous-time Markov processes with application to Markov decision theory
We present an update formula that allows the expression of the deviation matrix of a continuous-time Markov process with denumerable state space having generator matrix Q* through a continuous-time Markov process with generator matrix Q. We show that under suitable stability conditions the algorithm converges at a geometric rate. By applying the concept to three different examples, namely, the M/M/1 queue with vacations, the M/G/1 queue, and a tandem network, we illustrate the broad applicability of our approach. For a problem in admission control, we apply our approximation algorithm toMarkov decision theory for computing the optimal control policy. Numerical examples are presented to highlight the efficiency of the proposed algorithm. © 2010 INFORMS
- …