Search CORE

8,167 research outputs found

Monotonically improving limit-optimal strategies in finite state decision processes

Author: Hill T.P.
Wal van der, J.
Publication venue: Technische Hogeschool Eindhoven
Publication date: 01/01/1983
Field of study

In every finite-state leavable gambling problem and in every finite-state Markov decision process with discounted, negative or positive reward criteria there exists a Markov strategy which is monotonically improving and optimal in the limit along every history. An example is given to show that for the positive and gambling cases such strategies cannot be constructed by simply switching to a "better" action or gamble at each successive return to a state. Key words and phrases: gambling problem, Markov decision process, strategy, stationary strategy, monotonically improving strategy, limit-optimal strategy

Repository TU/e

Pure OAI Repository

Monotonically Improving Limit-Optimal Strategies in Finite-State Decision Processes

Author: Hill Theodore P.
Publication venue: DigitalCommons@CalPoly
Publication date: 01/08/1987
Field of study

Suppose you are in a casino with a number of dollars you wish to gamble. You may quit whenever you please, and your objective is to find a strategy which will maximize the probability that you reach some goal, say $1000. In formal gambling-theoretic terminology, since there are only a finite number of dollars in the world, and since you may quit and leave whenever you wish, this is a finite-state leavable gambling problem [4], and the classical result of Dubins and Savage [4, Theorem 3.9.2.] says that for each e \u3e 0 there is always a stationary strategy which is uniformly e-optimal. That is, there is always a strategy for betting in which the bet you place at each play depends only on your current fortune, and using this strategy your expected fortune at the time you quit gambling is within e of the most you could expect under any strategy. In general, optimal stationary strategies do not always exist, even in finite-state leavable gambling problems [4, Example 3.9.2.] although they do if the number of bets available for each fortune is also finite [4, Theorem 3.9.1.], an assumption which certainly does not hold in a casino with an oddsmaker (someone who will let you bet any amount on practically any future event - he simply sets odds he considers favourable to the house). An e-optimal stationary strategy is by definition quite good, but it does have the disadvantage that it is not getting any better, and in general always remains e away from optimal at some states. The purpose of this paper is to introduce the notion of a strategy which is monotonically improving and optimal in the limit, and to prove that such strategies exist in all finite-state leavable gambling problems and in all finite-state Markov decision processes with positive, negative, and discounted pay-offs; in fact even Markov strategies [6] with these properties are shown to exist. The questions of whether monotonically improving limit-optimal (MILO) strategies exist in nonleavable finite-state gambling problems, in finite-state average reward Markov decision processes, or in countable state problems (with various pay-offs) are left open

DigitalCommons@CalPoly

Monotonically Improving Limit-Optimal Strategies in Finite-State Decision Processes

Author: Jan Van der Wal
Theodore P. Hill
Publication venue: 'Institute for Operations Research and the Management Sciences (INFORMS)'
Publication date
Field of study

Crossref

Efficient Strategy Iteration for Mean Payoff in Markov Decision Processes

Author: Křetínský Jan
Meggendorfer Tobias
Publication venue
Publication date: 07/09/2017
Field of study

Markov decision processes (MDPs) are standard models for probabilistic systems with non-deterministic behaviours. Mean payoff (or long-run average reward) provides a mathematically elegant formalism to express performance related properties. Strategy iteration is one of the solution techniques applicable in this context. While in many other contexts it is the technique of choice due to advantages over e.g. value iteration, such as precision or possibility of domain-knowledge-aware initialization, it is rarely used for MDPs, since there it scales worse than value iteration. We provide several techniques that speed up strategy iteration by orders of magnitude for many MDPs, eliminating the performance disadvantage while preserving all its advantages

arXiv.org e-Print Archive

Lancaster E-Prints

Termination Criteria for Solving Concurrent Safety and Reachability Games

Author: Chatterjee Krishnendu
de Alfaro Luca
Henzinger Thomas A.
Publication venue
Publication date: 23/09/2008
Field of study

We consider concurrent games played on graphs. At every round of a game, each player simultaneously and independently selects a move; the moves jointly determine the transition to a successor state. Two basic objectives are the safety objective to stay forever in a given set of states, and its dual, the reachability objective to reach a given set of states. We present in this paper a strategy improvement algorithm for computing the value of a concurrent safety game, that is, the maximal probability with which player~1 can enforce the safety objective. The algorithm yields a sequence of player-1 strategies which ensure probabilities of winning that converge monotonically to the value of the safety game. Our result is significant because the strategy improvement algorithm provides, for the first time, a way to approximate the value of a concurrent safety game from below. Since a value iteration algorithm, or a strategy improvement algorithm for reachability games, can be used to approximate the same value from above, the combination of both algorithms yields a method for computing a converging sequence of upper and lower bounds for the values of concurrent reachability and safety games. Previous methods could approximate the values of these games only from one direction, and as no rates of convergence are known, they did not provide a practical way to solve these games

arXiv.org e-Print Archive

CiteSeerX

IST PubRep

Mathematical sciences : optimization and conditioning principles for discrete parameter stochastic processes

Author: Hill Theodore P.
Publication venue: Georgia Institute of Technology
Publication date: 01/01/1987
Field of study

Issued as Final report, Project no. G-37-60

Scholarly Materials And Research @ Georgia Tech

Discrete-continuous analysis of optimal equipment replacement

Author: HRITONENKO Natali
YATSENKO Yuri
Publication venue
Publication date
Field of study

In Operations Research, the equipment replacement process is usually modeled in discrete time. The optimal replacement strategies are found from discrete (or integer) programming problems, well known for their analytic and computational complexity. An alternative approach is represented by continuous-time vintage capital models that explicitly involve the equipment lifetime and are described by nonlinear integral equations. Then the optimal replacement is determined via the optimal control of such equations. These two alternative techniques describe essentially the same controlled dynamic process. We introduce and analyze a model that unites both approaches. The obtained results allow us to explore such important effects in optimal asset replacement as the transition and long-term dynamics, clustering and splitting of replaced assets, and the impact of improving technology and discounting. In particular, we demonstrate that the cluster splitting is possible in our replacement model with given demand in the case of an increasinTheoretical findings are illustrated with numeric examples.vintage capital models, optimization, equipment lifetime, discrete-continuous models.

Research Papers in Economics