29 research outputs found
A central limit theorem for temporally non-homogenous Markov chains with applications to dynamic programming
We prove a central limit theorem for a class of additive processes that arise
naturally in the theory of finite horizon Markov decision problems. The main
theorem generalizes a classic result of Dobrushin (1956) for temporally
non-homogeneous Markov chains, and the principal innovation is that here the
summands are permitted to depend on both the current state and a bounded number
of future states of the chain. We show through several examples that this added
flexibility gives one a direct path to asymptotic normality of the optimal
total reward of finite horizon Markov decision problems. The same examples also
explain why such results are not easily obtained by alternative Markovian
techniques such as enlargement of the state space.Comment: 27 pages, 1 figur
Essays in Problems of Optimal Sequential Decisions
In this dissertation, we study several Markovian problems of optimal sequential decisions by focusing on research questions that are driven by probabilistic and operations-management considerations. Our probabilistic interest is in understanding the distribution of the total reward that one obtains when implementing a policy that maximizes its expected value. With this respect, we study the sequential selection of unimodal and alternating subsequences from a random sample, and we prove accurate bounds for the expected values and exact asymptotics. In the unimodal problem, we also note that the variance of the optimal total reward can be bounded in terms of its expected value. This fact then motivates a much broader analysis that characterizes a class of Markov decision problems that share this important property. In the alternating subsequence problem, we also outline how one could be able to prove a Central Limit Theorem for the number of alternating selections in a finite random sample, as the size of the sample grows to infinity. Our operations-management interest is in studying the interaction of on-the-job learning and learning-by-doing in a workforce-related problem. Specifically, we study the sequential hiring and retention of heterogeneous workers who learn over time. We model the hiring and retention problem as a Bayesian infinite-armed bandit, and we characterize the optimal policy in detail. Through an extensive set of numerical examples, we gain insights into the managerial nature of the problem, and we demonstrate that the value of active monitoring and screening of employees can be substantial
Quickest Online Selection of an Increasing Subsequence of Specified Size
Given a sequence of independent random variables with a common continuous
distribution, we consider the online decision problem where one seeks to
minimize the expected value of the time that is needed to complete the
selection of a monotone increasing subsequence of a prespecified length .
This problem is dual to some online decision problems that have been considered
earlier, and this dual problem has some notable advantages. In particular, the
recursions and equations of optimality lead with relative ease to asymptotic
formulas for mean and variance of the minimal selection time.Comment: 17 page
Optimal Online Selection of a Monotone Subsequence: a Central Limit Theorem
Consider a sequence of independent random variables with a common
continuous distribution , and consider the task of choosing an increasing
subsequence where the observations are revealed sequentially and where an
observation must be accepted or rejected when it is first revealed. There is a
unique selection policy that is optimal in the sense that it
maximizes the expected value of , the number of selected
observations. We investigate the distribution of ; in particular,
we obtain a central limit theorem for and a detailed
understanding of its mean and variance for large . Our results and methods
are complementary to the work of Bruss and Delbaen (2004) where an analogous
central limit theorem is found for monotone increasing selections from a finite
sequence with cardinality where is a Poisson random variable that is
independent of the sequence.Comment: 26 page
Optimal Online Selection of an Alternating Subsequence: A Central Limit Theorem
We analyze the optimal policy for the sequential selection of an alternating subsequence from a sequence of n independent observations from a continuous distribution F, and we prove a central limit theorem for the number of selections made by that policy. The proof exploits the backward recursion of dynamic programming and assembles a detailed understanding of the associated value functions and selection rules
Beardwood-Halton-Hammersly Theorem for Stationary Ergodic Sequences: A Counterexample
We construct a stationary ergodic process X1,X2,…such that each Xt has the uniform distribution on the unit square and the length Ln of the shortest path through the points X1,X2,…,Xn is not asymptotic to a constant times the square root of n. In other words, we show that the Beardwood, Halton, and Hammersley theorem does not extend from the case of independent uniformly distributed random variables to the case of stationary ergodic sequences with uniform marginal distributions
Markov Decision Problems Where Means Bound Variances
We identify a rich class of finite-horizon Markov decision problems (MDPs) for which the variance of the optimal total reward can be bounded by a simple linear function of its expected value. The class is characterized by three natural properties: reward nonnegativity and boundedness, existence of a do-nothing action, and optimal action monotonicity. These properties are commonly present and typically easy to check. Implications of the class properties and of the variance bound are illustrated by examples of MDPs from operations research, operations management, financial engineering, and combinatorial optimization
Optimal Sequential Selection of a Unimodal Subsequence of a Random Sequence
We consider the problem of selecting sequentially a unimodal subsequence from
a sequence of independent identically distributed random variables, and we find
that a person doing optimal sequential selection does within a factor of the
square root of two as well as a prophet who knows all of the random
observations in advance of any selections. Our analysis applies in fact to
selections of subsequences that have d+1 monotone blocks, and, by including the
case d=0, our analysis also covers monotone subsequences