127 research outputs found

    The Value Iteration Algorithm is Not Strongly Polynomial for Discounted Dynamic Programming

    Full text link
    This note provides a simple example demonstrating that, if exact computations are allowed, the number of iterations required for the value iteration algorithm to find an optimal policy for discounted dynamic programming problems may grow arbitrarily quickly with the size of the problem. In particular, the number of iterations can be exponential in the number of actions. Thus, unlike policy iterations, the value iteration algorithm is not strongly polynomial for discounted dynamic programming

    On the Reduction of Total-Cost and Average-Cost MDPs to Discounted MDPs

    Full text link
    This paper provides conditions under which total-cost and average-cost Markov decision processes (MDPs) can be reduced to discounted ones. Results are given for transient total-cost MDPs with tran- sition rates whose values may be greater than one, as well as for average-cost MDPs with transition probabilities satisfying the condition that there is a state such that the expected time to reach it is uniformly bounded for all initial states and stationary policies. In particular, these reductions imply sufficient conditions for the validity of optimality equations and the existence of stationary optimal poli- cies for MDPs with undiscounted total cost and average-cost criteria. When the state and action sets are finite, these reductions lead to linear programming formulations and complexity estimates for MDPs under the aforementioned criteria

    Reduction of total-cost and average-cost MDPs with weakly continuous transition probabilities to discounted MDPs

    Full text link
    This note describes sufficient conditions under which total-cost and average-cost Markov decision processes (MDPs) with general state and action spaces, and with weakly continuous transition probabilities, can be reduced to discounted MDPs. For undiscounted problems, these reductions imply the validity of optimality equations and the existence of stationary optimal policies. The reductions also provide methods for computing optimal policies. The results are applied to a capacitated inventory control problem with fixed costs and lost sales

    Stochastic Setup-Cost Inventory Model with Backorders and Quasiconvex Cost Functions

    Full text link
    In this paper we study a periodic-review single-commodity setup-cost inventory model with backorders and holding/backlog costs satisfying quasiconvexity assumptions. We show that the Markov decision process for this inventory model satisfies the assumptions that lead to the validity of optimality equations for discounted and average-cost problems and to the existence of optimal (s,S)(s,S) policies. In particular, we prove the equicontinuity of the family of discounted value functions and the convergence of optimal discounted lower thresholds to the optimal average-cost one for some sequences of discount factors converging to 1.1. If an arbitrary nonnegative amount of inventory can be ordered, we establish stronger convergence properties: (i) the optimal discounted lower thresholds sαs_\alpha converge to optimal average-cost lower threshold s;s; and (ii) the discounted relative value functions converge to average-cost relative value function. These convergence results previously were known only for subsequences of discount factors even for problems with convex holding/backlog costs. The results of this paper hold for problems with deterministic positive lead times

    On Maximal Ranges of Vector Measures for Subsets and Purification of Transition Probabilities

    Full text link
    Consider a measurable space with an atomless finite vector measure. This measure defines a mapping of the σ\sigma-field into an Euclidean space. According to the Lyapunov convexity theorem, the range of this mapping is a convex compactum. Similar ranges are also defined for measurable subsets of the space. Two subsets with the same vector measure may have different ranges. We investigate the question whether, among all the subsets having the same given vector measure, there always exists a set with the maximal range of the vector measure. The answer to this question is positive for two-dimensional vector measures and negative for higher dimensions. We use the existence of maximal ranges to strengthen the Dvoretzky-Wald-Wolfowitz purification theorem for the case of two measures.Comment: 15 pages, 1 figure; minor revision, references added, typos correcte

    Continuity of Minima: Local Results

    Full text link
    This paper compares and generalizes Berge's maximum theorem for noncompact image sets established in Feinberg, Kasyanov and Voorneveld (2014) and the local maximum theorem established in Bonnans and Shapiro (2000)

    On the Convergence of Optimal Actions for Markov Decision Processes and the Optimality of (s,S)(s,S) Inventory Policies

    Full text link
    This paper studies convergence properties of optimal values and actions for discounted and average-cost Markov Decision Processes (MDPs) with weakly continuous transition probabilities and applies these properties to the stochastic periodic-review inventory control problem with backorders, positive setup costs, and convex holding/backordering costs. The following results are established for MDPs with possibly noncompact action sets and unbounded cost functions: (i) convergence of value iterations to optimal values for discounted problems with possibly non-zero terminal costs, (ii) convergence of optimal finite-horizon actions to optimal infinite-horizon actions for total discounted costs, as the time horizon tends to infinity, and (iii) convergence of optimal discount-cost actions to optimal average-cost actions for infinite-horizon problems, as the discount factor tends to 1. Being applied to the setup-cost inventory control problem, the general results on MDPs imply the optimality of (s,S)(s,S) policies and convergence properties of optimal thresholds. In particular this paper analyzes the setup-cost inventory control problem without two assumptions often used in the literature: (a) the demand is either discrete or continuous or (b) the backordering cost is higher than the cost of backordered inventory if the amount of backordered inventory is large

    Buffer Insertion for Bridges and Optimal Buffer Sizing for Communication Sub-System of Systems-on-Chip

    Full text link
    We have presented an optimal buffer sizing and buffer insertion methodology which uses stochastic models of the architecture and Continuous Time Markov Decision Processes CTMDPs. Such a methodology is useful in managing the scarce buffer resources available on chip as compared to network based data communication which can have large buffer space. The modeling of this problem in terms of a CT-MDP framework lead to a nonlinear formulation due to usage of bridges in the bus architecture. We present a methodology to split the problem into several smaller though linear systems and we then solve these subsystems.Comment: Submitted on behalf of EDAA (http://www.edaa.com/

    Fatou's Lemma for Weakly Converging Measures under the Uniform Integrability Condition

    Full text link
    This note describes Fatou's lemma and Lebesgue's dominated convergence theorem for a sequence of measures converging weakly to a finite measure and for a sequence of functions whose negative parts are uniformly integrable with respect to these measures. The note also provides new formulations of uniform Fatou's lemma, uniform Lebesgue convergence theorem, the Dunford-Pettis theorem, and the fundamental theorem for Young measures based on the equivalence of uniform integrability and the apparently weaker property of asymptotic uniform integrability for sequences of functions and finite measures

    On solutions of Kolmogorov's equations for jump Markov processes

    Full text link
    This paper studies three ways to construct a nonhomogeneous jump Markov process: (i) via a compensator of the random measure of a multivariate point process, (ii) as a minimal solution of the backward Kolmogorov equation, and (iii) as a minimal solution of the forward Kolmogorov equation. The main conclusion of this paper is that, for a given measurable transition intensity, commonly called a Q-function, all these constructions define the same transition function. If this transition function is regular, that is, the probability of accumulation of jumps is zero, then this transition function is the unique solution of the backward and forward Kolmogorov equations. For continuous Q-functions, Kolmogorov equations were studied in Feller's seminal paper. In particular, this paper extends Feller's results for continuous Q-functions to measurable Q-functions and provides additional results
    • …
    corecore