27 research outputs found

    An approximation approach for the deviation matrix of continuous-time Markov processes with application to Markov decision theory

    Get PDF
    We present an update formula that allows the expression of the deviation matrix of a continuous-time Markov process with denumerable state space having generator matrix Q* through a continuous-time Markov process with generator matrix Q. We show that under suitable stability conditions the algorithm converges at a geometric rate. By applying the concept to three different examples, namely, the M/M/1 queue with vacations, the M/G/1 queue, and a tandem network, we illustrate the broad applicability of our approach. For a problem in admission control, we apply our approximation algorithm toMarkov decision theory for computing the optimal control policy. Numerical examples are presented to highlight the efficiency of the proposed algorithm. © 2010 INFORMS

    Discrete-time controlled markov processes with average cost criterion: a survey

    Get PDF
    This work is a survey of the average cost control problem for discrete-time Markov processes. The authors have attempted to put together a comprehensive account of the considerable research on this problem over the past three decades. The exposition ranges from finite to Borel state and action spaces and includes a variety of methodologies to find and characterize optimal policies. The authors have included a brief historical perspective of the research efforts in this area and have compiled a substantial yet not exhaustive bibliography. The authors have also identified several important questions that are still open to investigation

    Differentiability of product measures

    Get PDF

    Some contributions to Markov decision processes

    Get PDF
    In a nutshell, this thesis studies discrete-time Markov decision processes (MDPs) on Borel Spaces, with possibly unbounded costs, and both expected (discounted) total cost and long-run expected average cost criteria. In Chapter 2, we systematically investigate a constrained absorbing MDP with expected total cost criterion and possibly unbounded (from both above and below) cost functions. We apply the convex analytic approach to derive the optimality and duality results, along with the existence of an optimal finite mixing policy. We also provide mild conditions under which a general constrained MDP model with state-action-dependent discount factors can be equivalently transformed into an absorbing MDP model. Chapter 3 treats a more constrained absorbing MDP, as compared with that in Chapter 2. The dynamic programming approach is applied to a reformulated unconstrained MDP model and the optimality results are obtained. In addition, the correspondence between policies in the original model and the reformulated one is illustrated. In Chapter 4, we attempt to extend the dynamic programming approach for standard MDPs with expected total cost criterion to the case, where the (iterated) coherent risk measure of the cost is taken as the performance measure to be minimized. The cost function under our consideration is allowed to be unbounded from the below, and possibly arbitrarily unbounded from the above. Under a fairly weak version of continuity-compactness conditions, we derive the optimality results for both the finite and infinite horizon cases, and establish value iteration as well as policy iteration algorithms. The standard MDP and the iterated conditional value-at-risk of the cost function are illustrated as two examples. Chapter 5 and 6 tackle MDPs with long-run expected average cost criterion. In Chapter 5, we consider a constrained MDP with possibly unbounded (from both above and below) cost functions. Under Lyapunov-like conditions, we show the sufficiency of stable policies to the concerned constrained problem. Furthermore, we introduce the corresponding space of performance vectors and manage to characterize each of its extreme points with a deterministic stationary policy. Finally, the existence of an optimal finite mixing policy is justified. Chapter 6 concerns an unconstrained MDP with the cost functions unbounded from the below and possibly arbitrarily unbounded from the above. We provide a detailed discussion on the issue of sufficient policies in the denumerable case, establish the average cost optimality inequality (ACOI) and show the existence of an optimal deterministic stationary policy. In Chapter 7, an inventory-production system is taken as an example of real-world applications to illustrate the main results in Chapter 2 and 5

    Weak differentiability of product measures

    Get PDF
    In this paper, we study cost functions over a finite collection of random variables. For these types of models, a calculus of differentiation is developed that allows us to obtain a closed-form expression for derivatives where "differentiation" has to be understood in the weak sense. The technique for proving the results is new and establishes an interesting link between functional analysis and gradient estimation. The key contribution of this paper is a product rule of weak differentiation. In addition, a product rule of weak analyticity is presented that allows for Taylor series approximations of finite products measures. In particular, from characteristics of the individual probability measures, a lower bound (i.e., domain of convergence) can be established for the set of parameter values for which the Taylor series converges to the true value. Applications of our theory to the ruin problem from insurance mathematics and to stochastic activity networks arising in project evaluation review techniques are provided. © 2010 INFORMS

    Continuous-Time Markov Decision Processes with Exponential Utility

    Get PDF
    In this paper, we consider a continuous-time Markov decision process (CTMDP) in Borel spaces, where the certainty equivalent with respect to the exponential utility of the total undiscounted cost is to be minimized. The cost rate is nonnegative. We establish the optimality equation. Under the compactness-continuity condition, we show the existence of a deterministic stationary optimal policy. We reduce the risk-sensitive CTMDP problem to an equivalent risk-sensitive discrete-time Markov decision process, which is with the same state and action spaces as the original CTMDP. In particular, the value iteration algorithm for the CTMDP problem follows from this reduction. We essentially do not need to impose a condition on the growth of the transition and cost rate in the state, and the controlled process could be explosive

    Some topics in web performance analysis

    Get PDF
    This thesis consists of four papers on web performance analysis. In the first paper we investigate the performance of overload control through queue length for two different web server architectures. The simulation result suggests that the benefit of request prioritization is noticeable only when the capacities of the sub-systems match each other. In the second paper we present an M/G/1/K*PS queueing model of a web server. We obtain closed form expressions for web server performance metrics such as average response time, throughput and blocking probability. The model is validated through real measurements. The third paper studies a queueing system with a load balancer and a pool of identical FCFS queues in parallel. By taking the number of servers to infinite, we show that the average waiting time for the system is not always minimized by routing each customer to the expected shortest queue when the information used for decision is stale. In the last paper we consider the problem of admission control to an M/M/1 queue under periodic observations with average cost criterion. The problem is formulated as a discrete time Markov decision process whose states are fully observable. A proof of the existence of the average optimal policy by the vanishing discounted approach is provided. We also show that the optimal policy is nonincreasing with respect to the observed number of customers in the system

    Markov Games: Receding Horizon Approach

    Get PDF
    We consider a receding horizon approach as an approximate solution totwo-person zero-sum Markov games with infinite horizon discounted costand average cost criteria. We first present error bounds from the optimalequilibrium value of the gamewhen both players take correlated equilibrium receding horizon policiesthat are based on emph{exact} or emph{approximate} solutions of recedingfinite horizon subgames. Motivated by the worst-case optimal control ofqueueing systems by Altman, we then analyze error boundswhen the minimizer plays the (approximate) receding horizon control andthe maximizer plays the worst case policy. We give three heuristicexamples of the approximate receding horizon control. We extend"rollout" by Bertsekas and Castanon and"parallel rollout" and "hindsight optimization" byChang {et al.) into the Markov game settingwithin the framework of the approximate receding horizon approach andanalyze their performances.From the rollout/parallel rollout approaches, the minimizing player seeks to improve the performance of a single heuristic policy it rolls out or to combine dynamically multiple heuristic policies in a set to improve theperformances of all of the heuristic policies simultaneously under theguess that the maximizing player has chosen a fixed worst-case policy. Given epsilon>0epsilon > 0, we give the value of the receding horizon whichguarantees that the parallel rollout policy with the horizon played by the minimizer emph{dominates} any heuristic policy in the set by epsilonepsilon.From the hindsight optimization approach, the minimizing player makes a decision based on his expected optimal hindsight performance over a finite horizon. We finally discuss practical implementations of the receding horizon approaches via simulation
    corecore