127 research outputs found
The Value Iteration Algorithm is Not Strongly Polynomial for Discounted Dynamic Programming
This note provides a simple example demonstrating that, if exact computations
are allowed, the number of iterations required for the value iteration
algorithm to find an optimal policy for discounted dynamic programming problems
may grow arbitrarily quickly with the size of the problem. In particular, the
number of iterations can be exponential in the number of actions. Thus, unlike
policy iterations, the value iteration algorithm is not strongly polynomial for
discounted dynamic programming
On the Reduction of Total-Cost and Average-Cost MDPs to Discounted MDPs
This paper provides conditions under which total-cost and average-cost Markov
decision processes (MDPs) can be reduced to discounted ones. Results are given
for transient total-cost MDPs with tran- sition rates whose values may be
greater than one, as well as for average-cost MDPs with transition
probabilities satisfying the condition that there is a state such that the
expected time to reach it is uniformly bounded for all initial states and
stationary policies. In particular, these reductions imply sufficient
conditions for the validity of optimality equations and the existence of
stationary optimal poli- cies for MDPs with undiscounted total cost and
average-cost criteria. When the state and action sets are finite, these
reductions lead to linear programming formulations and complexity estimates for
MDPs under the aforementioned criteria
Reduction of total-cost and average-cost MDPs with weakly continuous transition probabilities to discounted MDPs
This note describes sufficient conditions under which total-cost and
average-cost Markov decision processes (MDPs) with general state and action
spaces, and with weakly continuous transition probabilities, can be reduced to
discounted MDPs. For undiscounted problems, these reductions imply the validity
of optimality equations and the existence of stationary optimal policies. The
reductions also provide methods for computing optimal policies. The results are
applied to a capacitated inventory control problem with fixed costs and lost
sales
Stochastic Setup-Cost Inventory Model with Backorders and Quasiconvex Cost Functions
In this paper we study a periodic-review single-commodity setup-cost
inventory model with backorders and holding/backlog costs satisfying
quasiconvexity assumptions. We show that the Markov decision process for this
inventory model satisfies the assumptions that lead to the validity of
optimality equations for discounted and average-cost problems and to the
existence of optimal policies. In particular, we prove the
equicontinuity of the family of discounted value functions and the convergence
of optimal discounted lower thresholds to the optimal average-cost one for some
sequences of discount factors converging to If an arbitrary nonnegative
amount of inventory can be ordered, we establish stronger convergence
properties: (i) the optimal discounted lower thresholds converge to
optimal average-cost lower threshold and (ii) the discounted relative
value functions converge to average-cost relative value function. These
convergence results previously were known only for subsequences of discount
factors even for problems with convex holding/backlog costs. The results of
this paper hold for problems with deterministic positive lead times
On Maximal Ranges of Vector Measures for Subsets and Purification of Transition Probabilities
Consider a measurable space with an atomless finite vector measure. This
measure defines a mapping of the -field into an Euclidean space.
According to the Lyapunov convexity theorem, the range of this mapping is a
convex compactum. Similar ranges are also defined for measurable subsets of the
space. Two subsets with the same vector measure may have different ranges. We
investigate the question whether, among all the subsets having the same given
vector measure, there always exists a set with the maximal range of the vector
measure. The answer to this question is positive for two-dimensional vector
measures and negative for higher dimensions. We use the existence of maximal
ranges to strengthen the Dvoretzky-Wald-Wolfowitz purification theorem for the
case of two measures.Comment: 15 pages, 1 figure; minor revision, references added, typos correcte
Continuity of Minima: Local Results
This paper compares and generalizes Berge's maximum theorem for noncompact
image sets established in Feinberg, Kasyanov and Voorneveld (2014) and the
local maximum theorem established in Bonnans and Shapiro (2000)
On the Convergence of Optimal Actions for Markov Decision Processes and the Optimality of Inventory Policies
This paper studies convergence properties of optimal values and actions for
discounted and average-cost Markov Decision Processes (MDPs) with weakly
continuous transition probabilities and applies these properties to the
stochastic periodic-review inventory control problem with backorders, positive
setup costs, and convex holding/backordering costs. The following results are
established for MDPs with possibly noncompact action sets and unbounded cost
functions: (i) convergence of value iterations to optimal values for discounted
problems with possibly non-zero terminal costs, (ii) convergence of optimal
finite-horizon actions to optimal infinite-horizon actions for total discounted
costs, as the time horizon tends to infinity, and (iii) convergence of optimal
discount-cost actions to optimal average-cost actions for infinite-horizon
problems, as the discount factor tends to 1.
Being applied to the setup-cost inventory control problem, the general
results on MDPs imply the optimality of policies and convergence
properties of optimal thresholds. In particular this paper analyzes the
setup-cost inventory control problem without two assumptions often used in the
literature: (a) the demand is either discrete or continuous or (b) the
backordering cost is higher than the cost of backordered inventory if the
amount of backordered inventory is large
Buffer Insertion for Bridges and Optimal Buffer Sizing for Communication Sub-System of Systems-on-Chip
We have presented an optimal buffer sizing and buffer insertion methodology
which uses stochastic models of the architecture and Continuous Time Markov
Decision Processes CTMDPs. Such a methodology is useful in managing the scarce
buffer resources available on chip as compared to network based data
communication which can have large buffer space. The modeling of this problem
in terms of a CT-MDP framework lead to a nonlinear formulation due to usage of
bridges in the bus architecture. We present a methodology to split the problem
into several smaller though linear systems and we then solve these subsystems.Comment: Submitted on behalf of EDAA (http://www.edaa.com/
Fatou's Lemma for Weakly Converging Measures under the Uniform Integrability Condition
This note describes Fatou's lemma and Lebesgue's dominated convergence
theorem for a sequence of measures converging weakly to a finite measure and
for a sequence of functions whose negative parts are uniformly integrable with
respect to these measures. The note also provides new formulations of uniform
Fatou's lemma, uniform Lebesgue convergence theorem, the Dunford-Pettis
theorem, and the fundamental theorem for Young measures based on the
equivalence of uniform integrability and the apparently weaker property of
asymptotic uniform integrability for sequences of functions and finite
measures
On solutions of Kolmogorov's equations for jump Markov processes
This paper studies three ways to construct a nonhomogeneous jump Markov
process: (i) via a compensator of the random measure of a multivariate point
process, (ii) as a minimal solution of the backward Kolmogorov equation, and
(iii) as a minimal solution of the forward Kolmogorov equation. The main
conclusion of this paper is that, for a given measurable transition intensity,
commonly called a Q-function, all these constructions define the same
transition function. If this transition function is regular, that is, the
probability of accumulation of jumps is zero, then this transition function is
the unique solution of the backward and forward Kolmogorov equations. For
continuous Q-functions, Kolmogorov equations were studied in Feller's seminal
paper. In particular, this paper extends Feller's results for continuous
Q-functions to measurable Q-functions and provides additional results
- …