40 research outputs found
Towards effective planning strategies for robots in recycling
This work presents several ideas for planning under uncertainty. We seek to recycle electromechanical devices with a robotic arm. We resort to the Markov Decision Process formulation. In order to avoid scalability issues, we employ determinization techniques and hierarchical planning
Recommended from our members
Efficient Probabilistic Reasoning Using Partial State-Space Exploration
Planning, namely the ability of an autonomous agent to make decisions leading towards a certain goal, is one of the fundamental components of intelligent behavior. In the face of uncertainty, this problem is typically modeled as a Markov Decision Process (MDP). The MDP framework is highly expressive, and has been used in a variety of applications, such as mobile robots, flow assignment in heterogeneous networks, optimizing software in mobile phones, and aircraft collision avoidance. However, its wide adoption in real-world scenarios is still impaired by the complexity of solving large MDPs. Developing effective ways to tackle this complexity barrier is a challenging research problem.
This thesis focuses on the development of scalable and robust MDP solution approaches for partially exploring the state space of an MDP. The main contribution is a series of mathematical and algorithmic techniques for selecting the parts of the state space that are the most critical for effective planning, with the ultimate goal of maximizing performance in the presence of bounded resources. The proposed approaches work on two distinct axes: i) constructing reduced MDP models that are computationally easier to solve, but whose policies still result in near-optimal performance when applied to the original model, and ii) using sampling-based exploration that is biased towards states for which additional computation can be more productive, in a well-defined sense.
The first part of the thesis addresses the model reduction component, introducing an MDP reduction framework that generalizes popular solution approaches based on determinization. In particular, the framework encompasses a spectrum of MDP reductions differing along two dimensions: i) the number of outcomes per state-action pair that are fully accounted for, and ii) the number of occurrences of the remaining, exceptional, outcomes that are planned for in advance. An important insight resulting from this work is that the choice of reduction is crucial for achieving good performance, an issue under-explored by the planning community, even for determinization-based planners.
The second part of the thesis presents a sampling-based approach that does not require modification of the MDP model. The key idea is to avoid computation in states whose estimated optimal values are more likely to be correct, and rather direct it towards states whose values (which are closely related to policy quality) can be improved the most. The proposed approach represents a novel algorithmic framework that generalizes MDP algorithms based on labeling, a widely used technique in state-of-the-art planners. The framework can be leveraged to create a variety of MDP solvers with different trade-offs between computational complexity and policy quality, and its application to a variety of standard MDP benchmarks results in state-of-the-art performance
Recommended from our members
Reliable Decision-Making with Imprecise Models
The rapid growth in the deployment of autonomous systems across various sectors has generated considerable interest in how these systems can operate reliably in large, stochastic, and unstructured environments. Despite recent advances in artificial intelligence and machine learning, it is challenging to assure that autonomous systems will operate reliably in the open world. One of the causes of unreliable behavior is the impreciseness of the model used for decision-making. Due to the practical challenges in data collection and precise model specification, autonomous systems often operate based on models that do not represent all the details in the environment. Even if the system has access to a comprehensive decision-making model that accounts for all the details in the environment and all possible scenarios the agent may encounter, it may be intractable to solve this complex model optimally. Consequently, this complex, high fidelity model may be simplified to accelerate planning, introducing imprecision. Reasoning with such imprecise models affects the reliability of autonomous systems. A system\u27s actions may sometimes produce unexpected, undesirable consequences, which are often identified after deployment. How can we design autonomous systems that can operate reliably in the presence of uncertainty and model imprecision?
This dissertation presents solutions to address three classes of model imprecision in a Markov decision process, along with an analysis of the conditions under which bounded-performance can be guaranteed. First, an adaptive outcome selection approach is introduced to devise risk-aware reduced models of the environment that efficiently balance the trade-off between model simplicity and fidelity, to accelerate planning in resource-constrained settings. Second, a framework that extends stochastic shortest path framework to problems with imperfect information about the goal state during planning is introduced, along with two solution approaches to solve this problem. Finally, two complementary solution approaches are presented to minimize the negative side effects of agent actions. The techniques presented in this dissertation enable an autonomous system to detect and mitigate undesirable behavior, without redesigning the model entirely
A Survey of Monte Carlo Tree Search Methods
Monte Carlo tree search (MCTS) is a recently proposed search method that combines the precision of tree search with the generality of random sampling. It has received considerable interest due to its spectacular success in the difficult problem of computer Go, but has also proved beneficial in a range of other domains. This paper is a survey of the literature to date, intended to provide a snapshot of the state of the art after the first five years of MCTS research. We outline the core algorithm's derivation, impart some structure on the many variations and enhancements that have been proposed, and summarize the results from the key game and nongame domains to which MCTS methods have been applied. A number of open research questions indicate that the field is ripe for future work
Recommended from our members
Domain-Independent Planning for Markov Decision Processes with Factored State and Action Spaces
Markov Decision Processes (MDPs) are the de-facto formalism for studying sequential decision making problems with uncertainty, ranging from classical problems such as inventory control and path planning, to more complex problems such as reservoir control under rainfall uncertainty and emergency response optimization for fire and medical emergencies. Most prior research has focused on exact and approximate solutions to MDPs with factored states, assuming a small number of actions. In contrast to this, many applications are most naturally modeled as having factored actions described in terms of multiple action variables. In this thesis we study domain-independent algorithms that leverage the factored action structure in the MDP dynamics and reward, and scale better than treating each of the exponentially many joint actions as atomic. Our contributions are three-fold based on three fundamental approaches to MDP planning namely exact solution using symbolic dynamic programming (DP), anytime online planning using heuristic search and online action selection using hindsight optimization.
The first part is focused on deriving optimal policies over all states for MDPs whose state and action space are described in terms of multiple discrete random variables. In order to capture the factored action structure, we introduce new symbolic operators for computing DP updates over all states
efficiently by leveraging the abstract and symbolic representation of Decision Diagrams. Addressing the potential bottleneck of diagrammatic blowup in these operators we present a novel
and optimal policy iteration algorithm that emphasizes the diagrammatic compactness of the intermediate value functions and policies. The impact is seen in experiments on the well-studied problems of inventory control and system administration where our algorithm is able to exploit the increasing compactness of the optimal policy with increasing complexity of the action space.
Under the framework of anytime planning, the second part expands the scalability of our approach to factored actions by restricting its attention to the reachable part of the state space. Our contribution is the introduction of new symbolic generalization operators that guarantee a more moderate use of space and time while providing non-trivial generalization. These operators yield anytime algorithms that guarantee convergence to the optimal value and action for the current world state, while guaranteeing bounded growth in the size of the symbolic representation. We empirically show that our online algorithm is successfully able to combine forward search from an initial state with backwards generalized DP updates on symbolic states.
The third part considers a general class of hybrid (mixed discrete and continuous) state and action (HSA) MDPs. Whereas the insights from the above approaches are valid for hybrid MDPs as well, there are significant limitations inherent to the DP approach. Existing solvers for hybrid state and action MDPs are either limited to very restricted transition distributions, require knowledge of domain-specific basis functions to achieve good approximations, or do not scale. We explore a domain-independent approach based on the framework of hindsight optimization (HOP) for HSA-MDPs, which uses an upper bound on the finite-horizon action values for action selection. Our main contribution is a linear time reduction to a Mixed Integer Linear Program (MILP) that encodes the HOP objective, when the dynamics are specified as location-scale probability distributions parametrized by Piecewise Linear (PWL) functions of states and actions. In addition, we show how to use the same machinery to select actions based on a lower-bound generated by straight-line plans. Our empirical results show that the HSA-HOP approach effectively scales to high-dimensional problems and outperforms baselines that are capable of scaling to such large hybrid MDPs. In a concluding case study, we cast the real-time dispatch optimization problem faced by the Corvallis Fire Department as an HSA-MDP with factored actions. We show that our domain-independent planner significantly improves upon the responsiveness of the baseline that dispatches the nearest responders
Improving cost and probability estimates using interaction
Texto en inglés y resumen en inglés y españolLa planificación automática consiste en producir una colección de acciones o plan que lleven a un agente desde un estado inicial a un objetivo. Esta colección puede ser una secuencia simple o una secuencia más compleja parcialmente ordenada de acciones. Se denomina planificación clásica cuando se cuenta con información completa del problema y las acciones son deterministas. Durante los últimos años se han conseguido significativos avances en la planificación automática, siendo capaz de resolver problemas de considerable tamaño y complejidad. Sin embargo, este enfoque no es efectivo a la hora de resolver problemas reales ya que en este tipo de problemas pueden suceder eventos inesperados, la respuesta tras realizar una acción no se puede predecir y como consecuencia el estado actual del mundo no se conoce con certeza. Por lo tanto, la ejecución de un plan generado por un planificador clásico ante un problema de la vida real podría fallar al no tener en cuenta dichas contingencias. Cuando se cuenta con información incompleta del problema y/o las acciones no son deterministas se denomina planificación bajo incertidumbre. Tradicionalmente, estos enfoques hacen uso de Procesos de Markov para generar planes robustos, pero de alta sobrecarga computacional. Otros enfoques de planificación bajo incertidumbre hacen uso de planificación de contingencias, traducción del problema con incertidumbre a un problema determinista o determinization y replanificación para resolver problemas. En los últimos años, la planificación automática se ha sumado al área de estudio del reconocimiento de metas, el cual se puede interpretar como la operación inversa a la planificación ya que tiene como objetivo inferir la(s) meta(s) de un agente tras observar parcial o completamente las acciones llevadas a cabo por el mismo. Recientemente, se han aplicado técnicas de planificación para resolver problemas de reconocimiento de metas, pero este enfoque está todavía en sus comienzos. Problemas de planificación clásica, de planificación bajo incertidumbre y de reconocimiento de metas se pueden resolver mediante búsqueda heurística, una de las técnicas que más éxito ha tenido resolviendo estos problemas. Las funciones heurísticas más comunes en planificación automática calculan estimaciones de distancia en forma de coste o probabilidad de alcanzar el estado meta desde un estado actual particular. Se denominan heurísticas admisibles a aquellas que guían la búsqueda hacia soluciones óptimas. Es decir, que minimizan el coste o maximizan la probabilidad. Estas heurísticas, a pesar de producir una solución óptima, pueden no ser suficientemente informativas o ser de alto coste computacional. Por otro lado, se denominan heurísticas no admisibles a aquellas que generan soluciones subóptimas. Estas heurísticas pueden o no producir la solución óptima, pero son más informativas que las heurísticas admisibles y han demostrado tener un buen rendimiento en cuanto a tiempo y calidad de la solución. En esta tesis, se investiga sobre aquellas heurísticas en el estado del arte que consideran acciones con coste o acciones probabilísticas para calcular estimaciones de coste y estimaciones de probabilidad más precisas. Para mejorar la precisión de las estimaciones de coste, se desarrolla una función heurística que lleva a cabo propagación de costes en un grafo de planificación. Estas estimaciones son más exactas gracias al uso de Interaction, término que permite calcular la relación de independencia, de sinergia o de exclusión mutua entre pares de elementos. Estas estimaciones de coste se utilizan para (1) guiar a un planificador clásico hacia soluciones que minimizan el coste, (2) guiar a un planificador probabilístico hacia soluciones que maximizan la probabilidad y (3) resolver eficientemente problemas de reconocimiento de metas. Para mejorar la precisión de las estimaciones de probabilidad, se desarrolla un novedoso enfoque que lleva a cabo propagación de probabilidades en un grafo de planificación. Esta propagación de probabilidades es más avanzada que la previa ya que considera (1) la probabilidad global de cada proposición entre los posibles efectos de cada acción probabilística y (2) la dependencia de pares de proposiciones entre los posibles efectos de una acción probabilística. La unión de ambas técnicas permite calcular estimaciones de probabilidad más exactas y así generar soluciones de alta probabilidad de éxito. Como resultado de este estudio se obtiene una familia de heurísticas que calculan aproximaciones de coste y aproximaciones de probabilidad más exactas y constantes que otras heurísticas del estado del arte