13 research outputs found

    New prioritized value iteration for Markov decision processes

    Full text link
    The problem of solving large Markov decision processes accurately and quickly is challenging. Since the computational effort incurred is considerable, current research focuses on finding superior acceleration techniques. For instance, the convergence properties of current solution methods depend, to a great extent, on the order of backup operations. On one hand, algorithms such as topological sorting are able to find good orderings but their overhead is usually high. On the other hand, shortest path methods, such as Dijkstra's algorithm which is based on priority queues, have been applied successfully to the solution of deterministic shortest-path Markov decision processes. Here, we propose an improved value iteration algorithm based on Dijkstra's algorithm for solving shortest path Markov decision processes. The experimental results on a stochastic shortest-path problem show the feasibility of our approach. © Springer Science+Business Media B.V. 2011.García Hernández, MDG.; Ruiz Pinales, J.; Onaindia De La Rivaherrera, E.; Aviña Cervantes, JG.; Ledesma Orozco, S.; Alvarado Mendez, E.; Reyes Ballesteros, A. (2012). New prioritized value iteration for Markov decision processes. Artificial Intelligence Review. 37(2):157-167. doi:10.1007/s10462-011-9224-zS157167372Agrawal S, Roth D (2002) Learning a sparse representation for object detection. In: Proceedings of the 7th European conference on computer vision. Copenhagen, Denmark, pp 1–15Bellman RE (1954) The theory of dynamic programming. Bull Amer Math Soc 60: 503–516Bellman RE (1957) Dynamic programming. Princeton University Press, New JerseyBertsekas DP (1995) Dynamic programming and optimal control. Athena Scientific, MassachusettsBhuma K, Goldsmith J (2003) Bidirectional LAO* algorithm. In: Proceedings of indian international conferences on artificial intelligence. p 980–992Blackwell D (1965) Discounted dynamic programming. Ann Math Stat 36: 226–235Bonet B, Geffner H (2003a) Faster heuristic search algorithms for planning with uncertainty and full feedback. In: Proceedings of the 18th international joint conference on artificial intelligence. Morgan Kaufmann, Acapulco, México, pp 1233–1238Bonet B, Geffner H (2003b) Labeled RTDP: improving the convergence of real-time dynamic programming. In: Proceedings of the international conference on automated planning and scheduling. Trento, Italy, pp 12–21Bonet B, Geffner H (2006) Learning depth-first search: a unified approach to heuristic search in deterministic and non-deterministic settings and its application to MDP. In: Proceedings of the 16th international conference on automated planning and scheduling. Cumbria, UKBoutilier C, Dean T, Hanks S (1999) Decision-theoretic planning: structural assumptions and computational leverage. J Artif Intell Res 11: 1–94Chang I, Soo H (2007) Simulation-based algorithms for Markov decision processes Communications and control engineering. Springer, LondonDai P, Goldsmith J (2007a) Faster dynamic programming for Markov decision processes. Technical report. Doctoral consortium, department of computer science and engineering. University of WashingtonDai P, Goldsmith J (2007b) Topological value iteration algorithm for Markov decision processes. In: Proceedings of the 20th international joint conference on artificial intelligence. Hyderabad, India, pp 1860–1865Dai P, Hansen EA (2007c) Prioritizing bellman backups without a priority queue. In: Proceedings of the 17th international conference on automated planning and scheduling, association for the advancement of artificial intelligence. Rhode Island, USA, pp 113–119Dibangoye JS, Chaib-draa B, Mouaddib A (2008) A Novel prioritization technique for solving Markov decision processes. In: Proceedings of the 21st international FLAIRS (The Florida Artificial Intelligence Research Society) conference, association for the advancement of artificial intelligence. Florida, USAFerguson D, Stentz A (2004) Focused propagation of MDPs for path planning. In: Proceedings of the 16th IEEE international conference on tools with artificial intelligence. pp 310–317Hansen EA, Zilberstein S (2001) LAO: a heuristic search algorithm that finds solutions with loops. Artif Intell 129: 35–62Hinderer K, Waldmann KH (2003) The critical discount factor for finite Markovian decision processes with an absorbing set. Math Methods Oper Res 57: 1–19Li L (2009) A unifying framework for computational reinforcement learning theory. PhD Thesis. The state university of New Jersey, New Brunswick. NJLittman ML, Dean TL, Kaelbling LP (1995) On the complexity of solving Markov decision problems.In: Proceedings of the 11th international conference on uncertainty in artificial intelligence. Montreal, Quebec pp 394–402McMahan HB, Gordon G (2005a) Fast exact planning in Markov decision processes. In: Proceedings of the 15th international conference on automated planning and scheduling. Monterey, CA, USAMcMahan HB, Gordon G (2005b) Generalizing Dijkstra’s algorithm and gaussian elimination for solving MDPs. Technical report, Carnegie Mellon University, PittsburghMeuleau N, Brafman R, Benazera E (2006) Stochastic over-subscription planning using hierarchies of MDPs. In: Proceedings of the 16th international conference on automated planning and scheduling. Cumbria, UK, pp 121–130Moore A, Atkeson C (1993) Prioritized sweeping: reinforcement learning with less data and less real time. Mach Learn 13: 103–130Puterman ML (1994) Markov decision processes. Wiley Editors, New YorkPuterman ML (2005) Markov decision processes. Wiley Inter Science Editors, New YorkRussell S (2005) Artificial intelligence: a modern approach. Making complex decisions (Ch-17), 2nd edn. Pearson Prentice Hill Ed., USAShani G, Brafman R, Shimony S (2008) Prioritizing point-based POMDP solvers. IEEE Trans Syst Man Cybern 38(6): 1592–1605Sniedovich M (2006) Dijkstra’s algorithm revisited: the dynamic programming connexion. Control Cybern 35: 599–620Sniedovich M (2010) Dynamic programming: foundations and principles, 2nd edn. Pure and Applied Mathematics Series, UKTijms HC (2003) A first course in stochastic models. Discrete-time Markov decision processes (Ch-6). Wiley Editors, UKVanderbei RJ (1996) Optimal sailing strategies. Statistics and operations research program, University of Princeton, USA ( http://www.orfe.princeton.edu/~rvdb/sail/sail.html )Vanderbei RJ (2008) Linear programming: foundations and extensions, 3rd edn. Springer, New YorkWingate D, Seppi KD (2005) Prioritization methods for accelerating MDP solvers. J Mach Learn Res 6: 851–88

    Mixed acceleration techniques for solving quickly stochastic shortest-path markov decision processes

    Get PDF
    In this paper we propose the combination of accelerated variants of value iteration mixed with improved prioritized sweeping for the fast solution of stochastic shortest-path Markov decision processes. Value iteration is a classical algorithm for solving Markov decision processes, but this algorithm and its variants are quite slow for solving considerably large problems. In order to improve the solution time, acceleration techniques such as asynchronous updates, prioritization and prioritized sweeping have been explored in this paper. A topological reordering algorithm was also compared with static reordering. Experimental results obtained on finite state and action-space stochastic shortest-path problems show that our approach achieves a considerable reduction in the solution time with respect to the tested variants of value iteration. For instance, the experiments showed in one test a reduction of 5.7 times with respect to value iteration with asynchronous updates.En este documento proponemos la combinación de variantes aceleradas del algoritmo de iteración de valor combinadas con el algoritmo de barrido priorizado mejorado para la rápida solución de los procesos de decisión de Markov de ruta estocástica más corta. Iteración de valor es un algoritmo clásico para resolver a los procesos de decisión de Markov, pero este algoritmo y sus variantes son lentos para resolver problemas considerablemente grandes. Con el objeto de mejorar el tiempo de solución de este algoritmo, en este documento se han explorado técnicas de aceleración tales como actualizaciones asíncronas, priorización y barrido priorizado. Un algoritmo de reordenamiento topológico también fue comparado con uno de reordenamiento estático. Los resultados experimentales obtenidos en un problema de ruta estocástica más corta con espacios de estados-acciones finitos; muestran que nuestro enfoque logra una considerable reducción en el tiempo de solución con respecto a las variantes de iteración de valor probadas. Por ejemplo, los experimentos mostraron en una prueba una reducción de 5.7 veces con respecto a iteración de valor usando actualizaciones asíncronas.García Hernández, MDG.; Ruiz Pinales, J.; Onaindia De La Rivaherrera, E.; Ledesma-Orozco, S.; Aviña-Cervantes, J.; Alvarado-Méndez, E.; Reyes-Ballesteros, A. (2011). Mixed acceleration techniques for solving quickly stochastic shortest-path markov decision processes. Journal of Applied Research and Technology. 9(2):129-144. http://hdl.handle.net/10251/46761S1291449

    Processos de Decisão de Markov: um tutorial

    Get PDF
    Há situações em que decisões devem ser tomadas em seqüência, e o resultado de cada decisão não é claro para o tomador de decisões. Estas situações podem ser formuladas matematicamente como processos de decisão de Markov, e dadas as probabilidades dos valores resultantes das decisões, é possível determinar uma política que maximize o valor esperado da seqüência de decisões. Este tutorial descreve os processos de decisão de Markov (tanto o caso completamente observável como o parcialmente observável) e discute brevemente alguns métodos para a sua solução. Processos semi-Markovianos não são discutidos

    Robotics-Assisted Needle Steering for Percutaneous Interventions: Modeling and Experiments

    Get PDF
    Needle insertion and guidance plays an important role in medical procedures such as brachytherapy and biopsy. Flexible needles have the potential to facilitate precise targeting and avoid collisions during medical interventions while reducing trauma to the patient and post-puncture issues. Nevertheless, error introduced during guidance degrades the effectiveness of the planned therapy or diagnosis. Although steering using flexible bevel-tip needles provides great mobility and dexterity, a major barrier is the complexity of needle-tissue interaction that does not lend itself to intuitive control. To overcome this problem, a robotic system can be employed to perform trajectory planning and tracking by manipulation of the needle base. This research project focuses on a control-theoretic approach and draws on the rich literature from control and systems theory to model needle-tissue interaction and needle flexion and then design a robotics-based strategy for needle insertion/steering. The resulting solutions will directly benefit a wide range of needle-based interventions. The outcome of this computer-assisted approach will not only enable us to perform efficient preoperative trajectory planning, but will also provide more insight into needle-tissue interaction that will be helpful in developing advanced intraoperative algorithms for needle steering. Experimental validation of the proposed methodologies was carried out on a state of-the-art 5-DOF robotic system designed and constructed in-house primarily for prostate brachytherapy. The system is equipped with a Nano43 6-DOF force/torque sensor (ATI Industrial Automation) to measure forces and torques acting on the needle shaft. In our setup, an Aurora electromagnetic tracker (Northern Digital Inc.) is the sensing device used for measuring needle deflection. A multi-threaded application for control, sensor readings, data logging and communication over the ethernet was developed using Microsoft Visual C 2005, MATLAB 2007 and the QuaRC Toolbox (Quanser Inc.). Various artificial phantoms were developed so as to create a realistic medium in terms of elasticity and insertion force ranges; however, they simulated a uniform environment without exhibiting complexities of organic tissues. Experiments were also conducted on beef liver and fresh chicken breast, beef, and ham, to investigate the behavior of a variety biological tissues

    A new solution for Markov Decision Processes and its aerospace applications

    Get PDF
    Markov Decision Processes (MDPs) are a powerful technique for modelling sequential decisionmaking problems which have been used over many decades to solve problems including robotics,finance, and aerospace domains. However, MDPs are also known to be difficult to solve due toexplosion in the size of the state space which makes finding their solution intractable for manypractical problems. The traditional approaches such as value iteration required that each state inthe state space is represented as an element in an array, which eventually will exhaust the availablememory of any computer. It is not unusual to find practical problems in which the number ofstates is so large that it will never conceivably be tractable on any computer (e.g., the numberof states is of the order of the number of atoms in the universe.) Historically, this issue has beenmitigated by various means, but primarily by approximation (under the umbrella of ApproximateDynamic Programmming) where the solution of the MDP (the value function) is modelled via anapproximation function. Many linear function approximation methods have been proposed sinceMarkov Decision Processes were proposed nearly 70 years ago. More recently non-linear (e.g. deepneural net) function approximation methods have also been proposed to obtain a higher qualityestimate of the value function. While these methods help, they come with disadvantages includingloss of accuracy caused by the approximation, and a training or fitting phase which may take a longtime to converge This thesis makes two main contributions in the area of Markov Decision Processes: (1) a novelalternative theoretical understanding of the nature of Markov Decision Processes and their solutions,and (2) a new series of algorithms that can solve a subset of MDPs extremely quickly compared tothe historical methods described above. We provide both an intuitive and mathematical descriptionof the method. We describe a progression of algorithms that demonstrate the utility of the approachin aerospace applications including guidance to goals, collision avoidance, and pursuit evasion. We start in 2D environments with simple aircraft models and end with 3D team-based pursuit evasionwhere the aircraft perform rolls and loops in a highly dynamic environment. We close by providingdiscussion and describing future researc
    corecore