6 research outputs found

    New prioritized value iteration for Markov decision processes

    Full text link
    The problem of solving large Markov decision processes accurately and quickly is challenging. Since the computational effort incurred is considerable, current research focuses on finding superior acceleration techniques. For instance, the convergence properties of current solution methods depend, to a great extent, on the order of backup operations. On one hand, algorithms such as topological sorting are able to find good orderings but their overhead is usually high. On the other hand, shortest path methods, such as Dijkstra's algorithm which is based on priority queues, have been applied successfully to the solution of deterministic shortest-path Markov decision processes. Here, we propose an improved value iteration algorithm based on Dijkstra's algorithm for solving shortest path Markov decision processes. The experimental results on a stochastic shortest-path problem show the feasibility of our approach. © Springer Science+Business Media B.V. 2011.García Hernández, MDG.; Ruiz Pinales, J.; Onaindia De La Rivaherrera, E.; Aviña Cervantes, JG.; Ledesma Orozco, S.; Alvarado Mendez, E.; Reyes Ballesteros, A. (2012). New prioritized value iteration for Markov decision processes. Artificial Intelligence Review. 37(2):157-167. doi:10.1007/s10462-011-9224-zS157167372Agrawal S, Roth D (2002) Learning a sparse representation for object detection. In: Proceedings of the 7th European conference on computer vision. Copenhagen, Denmark, pp 1–15Bellman RE (1954) The theory of dynamic programming. Bull Amer Math Soc 60: 503–516Bellman RE (1957) Dynamic programming. Princeton University Press, New JerseyBertsekas DP (1995) Dynamic programming and optimal control. Athena Scientific, MassachusettsBhuma K, Goldsmith J (2003) Bidirectional LAO* algorithm. In: Proceedings of indian international conferences on artificial intelligence. p 980–992Blackwell D (1965) Discounted dynamic programming. Ann Math Stat 36: 226–235Bonet B, Geffner H (2003a) Faster heuristic search algorithms for planning with uncertainty and full feedback. In: Proceedings of the 18th international joint conference on artificial intelligence. Morgan Kaufmann, Acapulco, México, pp 1233–1238Bonet B, Geffner H (2003b) Labeled RTDP: improving the convergence of real-time dynamic programming. In: Proceedings of the international conference on automated planning and scheduling. Trento, Italy, pp 12–21Bonet B, Geffner H (2006) Learning depth-first search: a unified approach to heuristic search in deterministic and non-deterministic settings and its application to MDP. In: Proceedings of the 16th international conference on automated planning and scheduling. Cumbria, UKBoutilier C, Dean T, Hanks S (1999) Decision-theoretic planning: structural assumptions and computational leverage. J Artif Intell Res 11: 1–94Chang I, Soo H (2007) Simulation-based algorithms for Markov decision processes Communications and control engineering. Springer, LondonDai P, Goldsmith J (2007a) Faster dynamic programming for Markov decision processes. Technical report. Doctoral consortium, department of computer science and engineering. University of WashingtonDai P, Goldsmith J (2007b) Topological value iteration algorithm for Markov decision processes. In: Proceedings of the 20th international joint conference on artificial intelligence. Hyderabad, India, pp 1860–1865Dai P, Hansen EA (2007c) Prioritizing bellman backups without a priority queue. In: Proceedings of the 17th international conference on automated planning and scheduling, association for the advancement of artificial intelligence. Rhode Island, USA, pp 113–119Dibangoye JS, Chaib-draa B, Mouaddib A (2008) A Novel prioritization technique for solving Markov decision processes. In: Proceedings of the 21st international FLAIRS (The Florida Artificial Intelligence Research Society) conference, association for the advancement of artificial intelligence. Florida, USAFerguson D, Stentz A (2004) Focused propagation of MDPs for path planning. In: Proceedings of the 16th IEEE international conference on tools with artificial intelligence. pp 310–317Hansen EA, Zilberstein S (2001) LAO: a heuristic search algorithm that finds solutions with loops. Artif Intell 129: 35–62Hinderer K, Waldmann KH (2003) The critical discount factor for finite Markovian decision processes with an absorbing set. Math Methods Oper Res 57: 1–19Li L (2009) A unifying framework for computational reinforcement learning theory. PhD Thesis. The state university of New Jersey, New Brunswick. NJLittman ML, Dean TL, Kaelbling LP (1995) On the complexity of solving Markov decision problems.In: Proceedings of the 11th international conference on uncertainty in artificial intelligence. Montreal, Quebec pp 394–402McMahan HB, Gordon G (2005a) Fast exact planning in Markov decision processes. In: Proceedings of the 15th international conference on automated planning and scheduling. Monterey, CA, USAMcMahan HB, Gordon G (2005b) Generalizing Dijkstra’s algorithm and gaussian elimination for solving MDPs. Technical report, Carnegie Mellon University, PittsburghMeuleau N, Brafman R, Benazera E (2006) Stochastic over-subscription planning using hierarchies of MDPs. In: Proceedings of the 16th international conference on automated planning and scheduling. Cumbria, UK, pp 121–130Moore A, Atkeson C (1993) Prioritized sweeping: reinforcement learning with less data and less real time. Mach Learn 13: 103–130Puterman ML (1994) Markov decision processes. Wiley Editors, New YorkPuterman ML (2005) Markov decision processes. Wiley Inter Science Editors, New YorkRussell S (2005) Artificial intelligence: a modern approach. Making complex decisions (Ch-17), 2nd edn. Pearson Prentice Hill Ed., USAShani G, Brafman R, Shimony S (2008) Prioritizing point-based POMDP solvers. IEEE Trans Syst Man Cybern 38(6): 1592–1605Sniedovich M (2006) Dijkstra’s algorithm revisited: the dynamic programming connexion. Control Cybern 35: 599–620Sniedovich M (2010) Dynamic programming: foundations and principles, 2nd edn. Pure and Applied Mathematics Series, UKTijms HC (2003) A first course in stochastic models. Discrete-time Markov decision processes (Ch-6). Wiley Editors, UKVanderbei RJ (1996) Optimal sailing strategies. Statistics and operations research program, University of Princeton, USA ( http://www.orfe.princeton.edu/~rvdb/sail/sail.html )Vanderbei RJ (2008) Linear programming: foundations and extensions, 3rd edn. Springer, New YorkWingate D, Seppi KD (2005) Prioritization methods for accelerating MDP solvers. J Mach Learn Res 6: 851–88

    A switching planner for combined task and observation planning

    Get PDF
    Abstract From an automated planning perspective the problem of practical mobile robot control in realistic environments poses many important and contrary challenges. On the one hand, the planning process must be lightweight, robust, and timely. Over the lifetime of the robot it must always respond quickly with new plans that accommodate exogenous events, changing objectives, and the underlying unpredictability of the environment. On the other hand, in order to promote efficient behaviours the planning process must perform computationally expensive reasoning about contingencies and possible revisions of subjective beliefs according to quantitatively modelled uncertainty in acting and sensing. Towards addressing these challenges, we develop a continual planning approach that switches between using a fast satisficing "classical" planner, to decide on the overall strategy, and decision-theoretic planning to solve small abstract subproblems where deeper consideration of the sensing model is both practical, and can significantly impact overall performance. We evaluate our approach in large problems from a realistic robot exploration domain

    Optimal policies for Bayesian olfactory search in turbulent flows

    Full text link
    In many practical scenarios, a flying insect must search for the source of an emitted cue which is advected by the atmospheric wind. On the macroscopic scales of interest, turbulence tends to mix the cue into patches of relatively high concentration over a background of very low concentration, so that the insect will only detect the cue intermittently and cannot rely on chemotactic strategies which simply climb the concentration gradient. In this work, we cast this search problem in the language of a partially observable Markov decision process (POMDP) and use the Perseus algorithm to compute strategies that are near-optimal with respect to the arrival time. We test the computed strategies on a large two-dimensional grid, present the resulting trajectories and arrival time statistics, and compare these to the corresponding results for several heuristic strategies, including (space-aware) infotaxis, Thompson sampling, and QMDP. We find that the near-optimal policy found by our implementation of Perseus outperforms all heuristics we test by several measures. We use the near-optimal policy to study how the search difficulty depends on the starting location. We discuss additionally the choice of initial belief and the robustness of the policies to changes in the environment. Finally, we present a detailed and pedagogical discussion about the implementation of the Perseus algorithm, including the benefits -- and pitfalls -- of employing a reward shaping function.Comment: 35 pages, 19 figure

    Simplificación de los procesos de decisión de Markov mediante reglamentación de acciones y priorización de estados

    Full text link
    Para que se puedan construir rumbos de acción en ambientes reales, se debe considerar que las acciones pueden tener efectos distintos en el mundo (no determinismo) y ponderar el potencial de algún plan alternativo para alcanzar las metas del problema, considerando sus costes y recompensas (metas extendidas). Al respecto, la planificación basada en teoría de decisiones ha permitido solucionar problemas estocásticos, estableciendo rumbos de acción que involucran cantidades de información difíciles de procesar por el ser humano, evaluando sus fortalezas y debilidades con base en las teorías de probabilidad y de utilidad. Esta metodología ha incrementado últimamente su investigación debido al éxito de los procesos de decisión de Markov (MDPs) en problemas de investigación de operaciones, teoría de control, economía e inteligencia artificial, entre otros. Sin embargo, el problema de resolver los MDPs de considerables dimensiones con precisión y rapidez ha conducido a un reto computacional. Dado que el esfuerzo computacional es significativo, la investigación actual se centra en la búsqueda de técnicas superiores de aceleración. Por ejemplo, las propiedades de convergencia de sus métodos de solución actuales dependen, en gran medida, del orden de las operaciones de actualización. Por un lado, algoritmos tales como el de ordenamiento topológico han sido capaces de encontrar buenos ordenamientos, pero sus costes de inicio han sido usualmente altos. Por otro lado, los métodos de ruta más corta tales como el clásico algoritmo de Dijkstra, que está basado en colas de prioridad, han sido aplicados exitosamente a la solución de procesos de decisión de Markov de ruta determinística más corta. En esta tesis se propone un nuevo algoritmo de iteración de valor basado en el algoritmo de Dijkstra para resolver MDPs de ruta estocástica más corta. A diferencia de otros enfoques priorizados tales como el barrido priorizado mejorado, el enfoque aquí propuesto es capaz de tratar con múltiples estados meta y de inicio y, puesto que sucesivamente se actualiza cada estado utilizando la ecuación de Bellman, este enfoque garantiza la convergencia a la solución óptima. Además este algoritmo utiliza la función de valor actual como métrica de prioridad, puesto que el algoritmo de Dijkstra sugiere que un orden de actualización más adecuado está dado por el valor de la programación dinámica funcional. Los resultados experimentales obtenidos en una tarea de estrategias de navegación marítima en bote de vela muestran la factibilidad del enfoque propuesto. Se comprobó que el algoritmo propuesto reduce considerablemente el tiempo de solución requerido por el algoritmo de iteración de valor, desde un crecimiento de orden cuadrático -en función del número de estados- hasta uno de orden cercano a la linealidad.García Hernández, MDG. (2013). Simplificación de los procesos de decisión de Markov mediante reglamentación de acciones y priorización de estados [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/18467Palanci

    Prioritizing Point-Based POMDP Solvers ⋆

    No full text
    Abstract. Recent scaling up of POMDP solvers towards realistic applications is largely due to point-based methods such as PBVI, Perseus, and HSVI, which quickly converge to an approximate solution for medium-sized problems. These algorithms improve a value function by using backup operations over a single belief point. In the simpler domain of MDP solvers, prioritizing the order of equivalent backup operations on states is well known to speed up convergence. We generalize the notion of prioritized backups to the POMDP framework, and show that the ordering of backup operations on belief points is important. We also present a new algorithm, Prioritized Value Iteration (PVI), and show empirically that it outperforms current point-based algorithms. Finally, a new empirical evaluation measure, based on the number of backups and the number of belief points, is proposed, in order to provide more accurate benchmark comparisons.
    corecore