8,167 research outputs found
Monotonically improving limit-optimal strategies in finite state decision processes
In every finite-state leavable gambling problem and in every finite-state Markov decision process with discounted, negative or positive reward criteria there exists a Markov strategy which is monotonically improving and optimal in the limit along every history. An example is given to show that for the positive and gambling cases such strategies cannot be constructed by simply switching to a "better" action or gamble at each successive return to a state. Key words and phrases: gambling problem, Markov decision process, strategy, stationary strategy, monotonically improving strategy, limit-optimal strategy
Monotonically Improving Limit-Optimal Strategies in Finite-State Decision Processes
Suppose you are in a casino with a number of dollars you wish to gamble. You may quit whenever you please, and your objective is to find a strategy which will maximize the probability that you reach some goal, say $1000. In formal gambling-theoretic terminology, since there are only a finite number of dollars in the world, and since you may quit and leave whenever you wish, this is a finite-state leavable gambling problem [4], and the classical result of Dubins and Savage [4, Theorem 3.9.2.] says that for each e \u3e 0 there is always a stationary strategy which is uniformly e-optimal. That is, there is always a strategy for betting in which the bet you place at each play depends only on your current fortune, and using this strategy your expected fortune at the time you quit gambling is within e of the most you could expect under any strategy. In general, optimal stationary strategies do not always exist, even in finite-state leavable gambling problems [4, Example 3.9.2.] although they do if the number of bets available for each fortune is also finite [4, Theorem 3.9.1.], an assumption which certainly does not hold in a casino with an oddsmaker (someone who will let you bet any amount on practically any future event - he simply sets odds he considers favourable to the house). An e-optimal stationary strategy is by definition quite good, but it does have the disadvantage that it is not getting any better, and in general always remains e away from optimal at some states.
The purpose of this paper is to introduce the notion of a strategy which is monotonically improving and optimal in the limit, and to prove that such strategies exist in all finite-state leavable gambling problems and in all finite-state Markov decision processes with positive, negative, and discounted pay-offs; in fact even Markov strategies [6] with these properties are shown to exist. The questions of whether monotonically improving limit-optimal (MILO) strategies exist in nonleavable finite-state gambling problems, in finite-state average reward Markov decision processes, or in countable state problems (with various pay-offs) are left open
Efficient Strategy Iteration for Mean Payoff in Markov Decision Processes
Markov decision processes (MDPs) are standard models for probabilistic
systems with non-deterministic behaviours. Mean payoff (or long-run average
reward) provides a mathematically elegant formalism to express performance
related properties. Strategy iteration is one of the solution techniques
applicable in this context. While in many other contexts it is the technique of
choice due to advantages over e.g. value iteration, such as precision or
possibility of domain-knowledge-aware initialization, it is rarely used for
MDPs, since there it scales worse than value iteration. We provide several
techniques that speed up strategy iteration by orders of magnitude for many
MDPs, eliminating the performance disadvantage while preserving all its
advantages
Termination Criteria for Solving Concurrent Safety and Reachability Games
We consider concurrent games played on graphs. At every round of a game, each
player simultaneously and independently selects a move; the moves jointly
determine the transition to a successor state. Two basic objectives are the
safety objective to stay forever in a given set of states, and its dual, the
reachability objective to reach a given set of states. We present in this paper
a strategy improvement algorithm for computing the value of a concurrent safety
game, that is, the maximal probability with which player~1 can enforce the
safety objective. The algorithm yields a sequence of player-1 strategies which
ensure probabilities of winning that converge monotonically to the value of the
safety game.
Our result is significant because the strategy improvement algorithm
provides, for the first time, a way to approximate the value of a concurrent
safety game from below. Since a value iteration algorithm, or a strategy
improvement algorithm for reachability games, can be used to approximate the
same value from above, the combination of both algorithms yields a method for
computing a converging sequence of upper and lower bounds for the values of
concurrent reachability and safety games. Previous methods could approximate
the values of these games only from one direction, and as no rates of
convergence are known, they did not provide a practical way to solve these
games
Mathematical sciences : optimization and conditioning principles for discrete parameter stochastic processes
Issued as Final report, Project no. G-37-60
Discrete-continuous analysis of optimal equipment replacement
In Operations Research, the equipment replacement process is usually modeled in discrete time. The optimal replacement strategies are found from discrete (or integer) programming problems, well known for their analytic and computational complexity. An alternative approach is represented by continuous-time vintage capital models that explicitly involve the equipment lifetime and are described by nonlinear integral equations. Then the optimal replacement is determined via the optimal control of such equations. These two alternative techniques describe essentially the same controlled dynamic process. We introduce and analyze a model that unites both approaches. The obtained results allow us to explore such important effects in optimal asset replacement as the transition and long-term dynamics, clustering and splitting of replaced assets, and the impact of improving technology and discounting. In particular, we demonstrate that the cluster splitting is possible in our replacement model with given demand in the case of an increasinTheoretical findings are illustrated with numeric examples.vintage capital models, optimization, equipment lifetime, discrete-continuous models.
- …