8,698 research outputs found

    On the Convergence of Techniques that Improve Value Iteration

    Get PDF
    Prioritisation of Bellman backups or updating only a small subset of actions represent important techniques for speeding up planning in MDPs. The recent literature showed new efficient approaches which exploit these directions. Backward value iteration and backing up only the best actions were shown to lead to a significant reduction of the planning time. This paper conducts a theoretical and empirical analysis of these techniques and shows new important proofs. In particular, (1) it identifies weaker requirements for the convergence of backups based on best actions only, (2) a new method for evaluation of the Bellman error is shown for the update that updates one best action once, (3) it presents the theoretical proof of backward value iteration and establishes required initialisation, (4) and shows that the default state ordering of backups in standard value iteration can significantly influence its performance. Additionally, (5) the existing literature did not compare these methods, either empirically or analytically, against policy iteration. The rigorous empirical and novel theoretical parts of the paper reveal important associations and allow drawing guidelines on which type of value or policy iteration is suitable for a given domain. Finally, our chief message is that standard value iteration can be made far more efficient by simple modifications shown in the paper

    Parallel iterative solution methods for Markov decision processes

    Get PDF

    Some notes on iterative optimization of structured Markov decision processes with discounted rewards

    Get PDF
    The paper contains a comparison of solution techniques for Markov decision processes with respect to the total reward criterion. It is illustrated by examples that the effect of a number of improvements of the standard iterative method, which are advocated in the literature, is limited in some realistic situations. Numerical evidence is provided to show that exploiting the structure of the problem under consideration often yields a more substantial reduction of the required computational effort than some of the existing acceleration procedures. We advocate that this structure should be analyzed and used in choosing the appropriate solution procedure. This procedure might be composed by blending several of the acceleration concepts that are described in literature. Four test problems are sketched and solved with several successive approximation methods. These methods were composed after analyzing the structure of the problem. The required computational efforts are compared

    The action elimination algorithm for Markov decision processes

    Get PDF
    An efficient algorithm for solving Markov decision problems is proposed. The value iteration method of dynamic programming is used in conjunction with a test for nonoptimal actions. The algorithm applies to problems with undiscounted or discounted returns with infinite or finite planning horizon. In the finite horizon case the discount factor may exceed unity. The nonoptimality test, which is an extension of Hastings test for the undiscounted reward case, is used to identify actions which cannot be optimal at the current stage. As convergence proceeds the proportion of such actions increases producing major computational savings. For problems with discount factor less than one the test is shown to be tighter than that of MacQueen
    corecore