355 research outputs found
Decomposition and parallel processing techniques for two-time scale controlled Markov chains
This paper deals with a class of ergodic control problems
for systems described by Markov chains with
strong and weak interactions. These systems are composed
of a set of m subchains that are weakly coupled.
Using results recently established by Abbad et
al. one formulates a limit control problem the solution
of which can be obtained via an associated non-differentiable
convex programming (NDCP) problem. The
technique used to solve the NDCP problem is the Analytic
Center Cutting Plane Method (ACCPM) which
implements a dialogue between, on one hand, a master
program computing the analytical center of a localization
set containing the solution and, on the other hand,
an oracle proposing cutting planes that reduce the size
of the localization set at each main iteration. The interesting
aspect of this implementation comes from two
characteristics: (i) the oracle proposes cutting planes
by solving reduced sized Markov Decision Problems
(MDP) via a linear program (LP) or a policy iteration
method; (ii) several cutting planes can be proposed simultaneously
through a parallel implementation on m
processors. The paper concentrates on these two aspects
and shows, on a large scale MDP obtained from
the numerical approximation "a la Kushner-Dupuis” of
a singularly perturbed hybrid stochastic control problem,
the important computational speed-up obtained
A central limit theorem for temporally non-homogenous Markov chains with applications to dynamic programming
We prove a central limit theorem for a class of additive processes that arise
naturally in the theory of finite horizon Markov decision problems. The main
theorem generalizes a classic result of Dobrushin (1956) for temporally
non-homogeneous Markov chains, and the principal innovation is that here the
summands are permitted to depend on both the current state and a bounded number
of future states of the chain. We show through several examples that this added
flexibility gives one a direct path to asymptotic normality of the optimal
total reward of finite horizon Markov decision problems. The same examples also
explain why such results are not easily obtained by alternative Markovian
techniques such as enlargement of the state space.Comment: 27 pages, 1 figur
Ergodic Control and Polyhedral approaches to PageRank Optimization
We study a general class of PageRank optimization problems which consist in
finding an optimal outlink strategy for a web site subject to design
constraints. We consider both a continuous problem, in which one can choose the
intensity of a link, and a discrete one, in which in each page, there are
obligatory links, facultative links and forbidden links. We show that the
continuous problem, as well as its discrete variant when there are no
constraints coupling different pages, can both be modeled by constrained Markov
decision processes with ergodic reward, in which the webmaster determines the
transition probabilities of websurfers. Although the number of actions turns
out to be exponential, we show that an associated polytope of transition
measures has a concise representation, from which we deduce that the continuous
problem is solvable in polynomial time, and that the same is true for the
discrete problem when there are no coupling constraints. We also provide
efficient algorithms, adapted to very large networks. Then, we investigate the
qualitative features of optimal outlink strategies, and identify in particular
assumptions under which there exists a "master" page to which all controlled
pages should point. We report numerical results on fragments of the real web
graph.Comment: 39 page
An approximation approach for the deviation matrix of continuous-time Markov processes with application to Markov decision theory
We present an update formula that allows the expression of the deviation matrix of a continuous-time Markov process with denumerable state space having generator matrix Q* through a continuous-time Markov process with generator matrix Q. We show that under suitable stability conditions the algorithm converges at a geometric rate. By applying the concept to three different examples, namely, the M/M/1 queue with vacations, the M/G/1 queue, and a tandem network, we illustrate the broad applicability of our approach. For a problem in admission control, we apply our approximation algorithm toMarkov decision theory for computing the optimal control policy. Numerical examples are presented to highlight the efficiency of the proposed algorithm. © 2010 INFORMS
Perturbation and stability theory for Markov control problems
A unified approach to the asymptotic analysis of a Markov decision process disturbed by an ε-additive perturbation is proposed. Irrespective of whether the perturbation is regular or singular, the underlying control problem that needs to be understood is the limit Markov control problem. The properties of this problem are the subject of this study
- …