92 research outputs found

    New prioritized value iteration for Markov decision processes

    Full text link
    The problem of solving large Markov decision processes accurately and quickly is challenging. Since the computational effort incurred is considerable, current research focuses on finding superior acceleration techniques. For instance, the convergence properties of current solution methods depend, to a great extent, on the order of backup operations. On one hand, algorithms such as topological sorting are able to find good orderings but their overhead is usually high. On the other hand, shortest path methods, such as Dijkstra's algorithm which is based on priority queues, have been applied successfully to the solution of deterministic shortest-path Markov decision processes. Here, we propose an improved value iteration algorithm based on Dijkstra's algorithm for solving shortest path Markov decision processes. The experimental results on a stochastic shortest-path problem show the feasibility of our approach. © Springer Science+Business Media B.V. 2011.García Hernández, MDG.; Ruiz Pinales, J.; Onaindia De La Rivaherrera, E.; Aviña Cervantes, JG.; Ledesma Orozco, S.; Alvarado Mendez, E.; Reyes Ballesteros, A. (2012). New prioritized value iteration for Markov decision processes. Artificial Intelligence Review. 37(2):157-167. doi:10.1007/s10462-011-9224-zS157167372Agrawal S, Roth D (2002) Learning a sparse representation for object detection. In: Proceedings of the 7th European conference on computer vision. Copenhagen, Denmark, pp 1–15Bellman RE (1954) The theory of dynamic programming. Bull Amer Math Soc 60: 503–516Bellman RE (1957) Dynamic programming. Princeton University Press, New JerseyBertsekas DP (1995) Dynamic programming and optimal control. Athena Scientific, MassachusettsBhuma K, Goldsmith J (2003) Bidirectional LAO* algorithm. In: Proceedings of indian international conferences on artificial intelligence. p 980–992Blackwell D (1965) Discounted dynamic programming. Ann Math Stat 36: 226–235Bonet B, Geffner H (2003a) Faster heuristic search algorithms for planning with uncertainty and full feedback. In: Proceedings of the 18th international joint conference on artificial intelligence. Morgan Kaufmann, Acapulco, México, pp 1233–1238Bonet B, Geffner H (2003b) Labeled RTDP: improving the convergence of real-time dynamic programming. In: Proceedings of the international conference on automated planning and scheduling. Trento, Italy, pp 12–21Bonet B, Geffner H (2006) Learning depth-first search: a unified approach to heuristic search in deterministic and non-deterministic settings and its application to MDP. In: Proceedings of the 16th international conference on automated planning and scheduling. Cumbria, UKBoutilier C, Dean T, Hanks S (1999) Decision-theoretic planning: structural assumptions and computational leverage. J Artif Intell Res 11: 1–94Chang I, Soo H (2007) Simulation-based algorithms for Markov decision processes Communications and control engineering. Springer, LondonDai P, Goldsmith J (2007a) Faster dynamic programming for Markov decision processes. Technical report. Doctoral consortium, department of computer science and engineering. University of WashingtonDai P, Goldsmith J (2007b) Topological value iteration algorithm for Markov decision processes. In: Proceedings of the 20th international joint conference on artificial intelligence. Hyderabad, India, pp 1860–1865Dai P, Hansen EA (2007c) Prioritizing bellman backups without a priority queue. In: Proceedings of the 17th international conference on automated planning and scheduling, association for the advancement of artificial intelligence. Rhode Island, USA, pp 113–119Dibangoye JS, Chaib-draa B, Mouaddib A (2008) A Novel prioritization technique for solving Markov decision processes. In: Proceedings of the 21st international FLAIRS (The Florida Artificial Intelligence Research Society) conference, association for the advancement of artificial intelligence. Florida, USAFerguson D, Stentz A (2004) Focused propagation of MDPs for path planning. In: Proceedings of the 16th IEEE international conference on tools with artificial intelligence. pp 310–317Hansen EA, Zilberstein S (2001) LAO: a heuristic search algorithm that finds solutions with loops. Artif Intell 129: 35–62Hinderer K, Waldmann KH (2003) The critical discount factor for finite Markovian decision processes with an absorbing set. Math Methods Oper Res 57: 1–19Li L (2009) A unifying framework for computational reinforcement learning theory. PhD Thesis. The state university of New Jersey, New Brunswick. NJLittman ML, Dean TL, Kaelbling LP (1995) On the complexity of solving Markov decision problems.In: Proceedings of the 11th international conference on uncertainty in artificial intelligence. Montreal, Quebec pp 394–402McMahan HB, Gordon G (2005a) Fast exact planning in Markov decision processes. In: Proceedings of the 15th international conference on automated planning and scheduling. Monterey, CA, USAMcMahan HB, Gordon G (2005b) Generalizing Dijkstra’s algorithm and gaussian elimination for solving MDPs. Technical report, Carnegie Mellon University, PittsburghMeuleau N, Brafman R, Benazera E (2006) Stochastic over-subscription planning using hierarchies of MDPs. In: Proceedings of the 16th international conference on automated planning and scheduling. Cumbria, UK, pp 121–130Moore A, Atkeson C (1993) Prioritized sweeping: reinforcement learning with less data and less real time. Mach Learn 13: 103–130Puterman ML (1994) Markov decision processes. Wiley Editors, New YorkPuterman ML (2005) Markov decision processes. Wiley Inter Science Editors, New YorkRussell S (2005) Artificial intelligence: a modern approach. Making complex decisions (Ch-17), 2nd edn. Pearson Prentice Hill Ed., USAShani G, Brafman R, Shimony S (2008) Prioritizing point-based POMDP solvers. IEEE Trans Syst Man Cybern 38(6): 1592–1605Sniedovich M (2006) Dijkstra’s algorithm revisited: the dynamic programming connexion. Control Cybern 35: 599–620Sniedovich M (2010) Dynamic programming: foundations and principles, 2nd edn. Pure and Applied Mathematics Series, UKTijms HC (2003) A first course in stochastic models. Discrete-time Markov decision processes (Ch-6). Wiley Editors, UKVanderbei RJ (1996) Optimal sailing strategies. Statistics and operations research program, University of Princeton, USA ( http://www.orfe.princeton.edu/~rvdb/sail/sail.html )Vanderbei RJ (2008) Linear programming: foundations and extensions, 3rd edn. Springer, New YorkWingate D, Seppi KD (2005) Prioritization methods for accelerating MDP solvers. J Mach Learn Res 6: 851–88

    On the Convergence of Techniques that Improve Value Iteration

    Get PDF
    Prioritisation of Bellman backups or updating only a small subset of actions represent important techniques for speeding up planning in MDPs. The recent literature showed new efficient approaches which exploit these directions. Backward value iteration and backing up only the best actions were shown to lead to a significant reduction of the planning time. This paper conducts a theoretical and empirical analysis of these techniques and shows new important proofs. In particular, (1) it identifies weaker requirements for the convergence of backups based on best actions only, (2) a new method for evaluation of the Bellman error is shown for the update that updates one best action once, (3) it presents the theoretical proof of backward value iteration and establishes required initialisation, (4) and shows that the default state ordering of backups in standard value iteration can significantly influence its performance. Additionally, (5) the existing literature did not compare these methods, either empirically or analytically, against policy iteration. The rigorous empirical and novel theoretical parts of the paper reveal important associations and allow drawing guidelines on which type of value or policy iteration is suitable for a given domain. Finally, our chief message is that standard value iteration can be made far more efficient by simple modifications shown in the paper

    Inexact GMRES Policy Iteration for Large-Scale Markov Decision Processes

    Full text link
    Policy iteration enjoys a local quadratic rate of contraction, but its iterations are computationally expensive for Markov decision processes (MDPs) with a large number of states. In light of the connection between policy iteration and the semismooth Newton method and taking inspiration from the inexact variants of the latter, we propose \textit{inexact policy iteration}, a new class of methods for large-scale finite MDPs with local contraction guarantees. We then design an instance based on the deployment of GMRES for the approximate policy evaluation step, which we call inexact GMRES policy iteration. Finally, we demonstrate the superior practical performance of inexact GMRES policy iteration on an MDP with 10000 states, where it achieves a Ă—5.8\times 5.8 and Ă—2.2\times 2.2 speedup with respect to policy iteration and optimistic policy iteration, respectively

    Mixed acceleration techniques for solving quickly stochastic shortest-path markov decision processes

    Get PDF
    In this paper we propose the combination of accelerated variants of value iteration mixed with improved prioritized sweeping for the fast solution of stochastic shortest-path Markov decision processes. Value iteration is a classical algorithm for solving Markov decision processes, but this algorithm and its variants are quite slow for solving considerably large problems. In order to improve the solution time, acceleration techniques such as asynchronous updates, prioritization and prioritized sweeping have been explored in this paper. A topological reordering algorithm was also compared with static reordering. Experimental results obtained on finite state and action-space stochastic shortest-path problems show that our approach achieves a considerable reduction in the solution time with respect to the tested variants of value iteration. For instance, the experiments showed in one test a reduction of 5.7 times with respect to value iteration with asynchronous updates.En este documento proponemos la combinación de variantes aceleradas del algoritmo de iteración de valor combinadas con el algoritmo de barrido priorizado mejorado para la rápida solución de los procesos de decisión de Markov de ruta estocástica más corta. Iteración de valor es un algoritmo clásico para resolver a los procesos de decisión de Markov, pero este algoritmo y sus variantes son lentos para resolver problemas considerablemente grandes. Con el objeto de mejorar el tiempo de solución de este algoritmo, en este documento se han explorado técnicas de aceleración tales como actualizaciones asíncronas, priorización y barrido priorizado. Un algoritmo de reordenamiento topológico también fue comparado con uno de reordenamiento estático. Los resultados experimentales obtenidos en un problema de ruta estocástica más corta con espacios de estados-acciones finitos; muestran que nuestro enfoque logra una considerable reducción en el tiempo de solución con respecto a las variantes de iteración de valor probadas. Por ejemplo, los experimentos mostraron en una prueba una reducción de 5.7 veces con respecto a iteración de valor usando actualizaciones asíncronas.García Hernández, MDG.; Ruiz Pinales, J.; Onaindia De La Rivaherrera, E.; Ledesma-Orozco, S.; Aviña-Cervantes, J.; Alvarado-Méndez, E.; Reyes-Ballesteros, A. (2011). Mixed acceleration techniques for solving quickly stochastic shortest-path markov decision processes. Journal of Applied Research and Technology. 9(2):129-144. http://hdl.handle.net/10251/46761S1291449

    Tools and Algorithms for the Construction and Analysis of Systems

    Get PDF
    This open access book constitutes the proceedings of the 28th International Conference on Tools and Algorithms for the Construction and Analysis of Systems, TACAS 2022, which was held during April 2-7, 2022, in Munich, Germany, as part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2022. The 46 full papers and 4 short papers presented in this volume were carefully reviewed and selected from 159 submissions. The proceedings also contain 16 tool papers of the affiliated competition SV-Comp and 1 paper consisting of the competition report. TACAS is a forum for researchers, developers, and users interested in rigorously based tools and algorithms for the construction and analysis of systems. The conference aims to bridge the gaps between different communities with this common interest and to support them in their quest to improve the utility, reliability, exibility, and efficiency of tools and algorithms for building computer-controlled systems
    • …
    corecore