8,167 research outputs found

    Monotonically improving limit-optimal strategies in finite state decision processes

    Get PDF
    In every finite-state leavable gambling problem and in every finite-state Markov decision process with discounted, negative or positive reward criteria there exists a Markov strategy which is monotonically improving and optimal in the limit along every history. An example is given to show that for the positive and gambling cases such strategies cannot be constructed by simply switching to a "better" action or gamble at each successive return to a state. Key words and phrases: gambling problem, Markov decision process, strategy, stationary strategy, monotonically improving strategy, limit-optimal strategy

    Monotonically Improving Limit-Optimal Strategies in Finite-State Decision Processes

    Get PDF
    Suppose you are in a casino with a number of dollars you wish to gamble. You may quit whenever you please, and your objective is to find a strategy which will maximize the probability that you reach some goal, say $1000. In formal gambling-theoretic terminology, since there are only a finite number of dollars in the world, and since you may quit and leave whenever you wish, this is a finite-state leavable gambling problem [4], and the classical result of Dubins and Savage [4, Theorem 3.9.2.] says that for each e \u3e 0 there is always a stationary strategy which is uniformly e-optimal. That is, there is always a strategy for betting in which the bet you place at each play depends only on your current fortune, and using this strategy your expected fortune at the time you quit gambling is within e of the most you could expect under any strategy. In general, optimal stationary strategies do not always exist, even in finite-state leavable gambling problems [4, Example 3.9.2.] although they do if the number of bets available for each fortune is also finite [4, Theorem 3.9.1.], an assumption which certainly does not hold in a casino with an oddsmaker (someone who will let you bet any amount on practically any future event - he simply sets odds he considers favourable to the house). An e-optimal stationary strategy is by definition quite good, but it does have the disadvantage that it is not getting any better, and in general always remains e away from optimal at some states. The purpose of this paper is to introduce the notion of a strategy which is monotonically improving and optimal in the limit, and to prove that such strategies exist in all finite-state leavable gambling problems and in all finite-state Markov decision processes with positive, negative, and discounted pay-offs; in fact even Markov strategies [6] with these properties are shown to exist. The questions of whether monotonically improving limit-optimal (MILO) strategies exist in nonleavable finite-state gambling problems, in finite-state average reward Markov decision processes, or in countable state problems (with various pay-offs) are left open

    Monotonically Improving Limit-Optimal Strategies in Finite-State Decision Processes

    Full text link

    Efficient Strategy Iteration for Mean Payoff in Markov Decision Processes

    Full text link
    Markov decision processes (MDPs) are standard models for probabilistic systems with non-deterministic behaviours. Mean payoff (or long-run average reward) provides a mathematically elegant formalism to express performance related properties. Strategy iteration is one of the solution techniques applicable in this context. While in many other contexts it is the technique of choice due to advantages over e.g. value iteration, such as precision or possibility of domain-knowledge-aware initialization, it is rarely used for MDPs, since there it scales worse than value iteration. We provide several techniques that speed up strategy iteration by orders of magnitude for many MDPs, eliminating the performance disadvantage while preserving all its advantages

    Termination Criteria for Solving Concurrent Safety and Reachability Games

    Get PDF
    We consider concurrent games played on graphs. At every round of a game, each player simultaneously and independently selects a move; the moves jointly determine the transition to a successor state. Two basic objectives are the safety objective to stay forever in a given set of states, and its dual, the reachability objective to reach a given set of states. We present in this paper a strategy improvement algorithm for computing the value of a concurrent safety game, that is, the maximal probability with which player~1 can enforce the safety objective. The algorithm yields a sequence of player-1 strategies which ensure probabilities of winning that converge monotonically to the value of the safety game. Our result is significant because the strategy improvement algorithm provides, for the first time, a way to approximate the value of a concurrent safety game from below. Since a value iteration algorithm, or a strategy improvement algorithm for reachability games, can be used to approximate the same value from above, the combination of both algorithms yields a method for computing a converging sequence of upper and lower bounds for the values of concurrent reachability and safety games. Previous methods could approximate the values of these games only from one direction, and as no rates of convergence are known, they did not provide a practical way to solve these games

    Mathematical sciences : optimization and conditioning principles for discrete parameter stochastic processes

    Get PDF
    Issued as Final report, Project no. G-37-60

    Discrete-continuous analysis of optimal equipment replacement

    Get PDF
    In Operations Research, the equipment replacement process is usually modeled in discrete time. The optimal replacement strategies are found from discrete (or integer) programming problems, well known for their analytic and computational complexity. An alternative approach is represented by continuous-time vintage capital models that explicitly involve the equipment lifetime and are described by nonlinear integral equations. Then the optimal replacement is determined via the optimal control of such equations. These two alternative techniques describe essentially the same controlled dynamic process. We introduce and analyze a model that unites both approaches. The obtained results allow us to explore such important effects in optimal asset replacement as the transition and long-term dynamics, clustering and splitting of replaced assets, and the impact of improving technology and discounting. In particular, we demonstrate that the cluster splitting is possible in our replacement model with given demand in the case of an increasinTheoretical findings are illustrated with numeric examples.vintage capital models, optimization, equipment lifetime, discrete-continuous models.
    • …
    corecore