19 research outputs found
Learning Domain-Independent Planning Heuristics with Hypergraph Networks
We present the first approach capable of learning domain-independent planning
heuristics entirely from scratch. The heuristics we learn map the hypergraph
representation of the delete-relaxation of the planning problem at hand, to a
cost estimate that approximates that of the least-cost path from the current
state to the goal through the hypergraph. We generalise Graph Networks to
obtain a new framework for learning over hypergraphs, which we specialise to
learn planning heuristics by training over state/value pairs obtained from
optimal cost plans. Our experiments show that the resulting architecture,
STRIPS-HGNs, is capable of learning heuristics that are competitive with
existing delete-relaxation heuristics including LM-cut. We show that the
heuristics we learn are able to generalise across different problems and
domains, including to domains that were not seen during training
Recommended from our members
Efficient Probabilistic Reasoning Using Partial State-Space Exploration
Planning, namely the ability of an autonomous agent to make decisions leading towards a certain goal, is one of the fundamental components of intelligent behavior. In the face of uncertainty, this problem is typically modeled as a Markov Decision Process (MDP). The MDP framework is highly expressive, and has been used in a variety of applications, such as mobile robots, flow assignment in heterogeneous networks, optimizing software in mobile phones, and aircraft collision avoidance. However, its wide adoption in real-world scenarios is still impaired by the complexity of solving large MDPs. Developing effective ways to tackle this complexity barrier is a challenging research problem.
This thesis focuses on the development of scalable and robust MDP solution approaches for partially exploring the state space of an MDP. The main contribution is a series of mathematical and algorithmic techniques for selecting the parts of the state space that are the most critical for effective planning, with the ultimate goal of maximizing performance in the presence of bounded resources. The proposed approaches work on two distinct axes: i) constructing reduced MDP models that are computationally easier to solve, but whose policies still result in near-optimal performance when applied to the original model, and ii) using sampling-based exploration that is biased towards states for which additional computation can be more productive, in a well-defined sense.
The first part of the thesis addresses the model reduction component, introducing an MDP reduction framework that generalizes popular solution approaches based on determinization. In particular, the framework encompasses a spectrum of MDP reductions differing along two dimensions: i) the number of outcomes per state-action pair that are fully accounted for, and ii) the number of occurrences of the remaining, exceptional, outcomes that are planned for in advance. An important insight resulting from this work is that the choice of reduction is crucial for achieving good performance, an issue under-explored by the planning community, even for determinization-based planners.
The second part of the thesis presents a sampling-based approach that does not require modification of the MDP model. The key idea is to avoid computation in states whose estimated optimal values are more likely to be correct, and rather direct it towards states whose values (which are closely related to policy quality) can be improved the most. The proposed approach represents a novel algorithmic framework that generalizes MDP algorithms based on labeling, a widely used technique in state-of-the-art planners. The framework can be leveraged to create a variety of MDP solvers with different trade-offs between computational complexity and policy quality, and its application to a variety of standard MDP benchmarks results in state-of-the-art performance
Recommended from our members
Learning and Improving Policies for Probabilistic Planning Problems
In this work, we study the problem of learning and improving policies for probabilistic planning problems. In the first part, we train neural network policies for probabilistic planning problems modeled as factored Markov decision problems. The objective is to train problem-specific neural networks via supervised learning to imitate the action choices of expert planners. In the second part, we focus on the problem of online policy improvement, where we try to improve on a given base policy via online search. Since search trees for these problems tend to be huge, in practice, action branches need to be pruned, which can affect policy improvement adversely. We formalize this notion by introducing the choice function framework and establish sufficient conditions on actions expanded in search trees for guaranteed policy improvement. In the next part, we draw attention to the fact that theoretical guarantees of policy improvement can fail when the ideal conditions assumed in theory do not hold in practice. We propose benchmark problems, baselines and metrics to assess the empirical performance of online policy improvement algorithms. In the final part, we focus on approximation via state aggregation in MDPs and study the theoretical guarantees of several aggregation schemes
Planification d'actions concurrentes sous contraintes et incertitude
Cette thèse présente des contributions dans le domaine de la planification en intelligence artificielle, et ce, plus particulièrement pour une classe de problèmes qui combinent des actions concurrentes (simultanées) et de l'incertitude. Deux formes d'incertitude sont prises en charge, soit sur la durée des actions et sur leurs effets.Cette classe de problèmes est motivée par plusieurs applications réelles dont la robotique mobile, les jeux et les systèmes d'aide à la décision.Cette classe a notamment été identifiée par la NASA pour la planification des activités des rovers déployés sur Mars. Les algorithmes de planification présentés dans cette thèse exploitent une nouvelle représentation compacte d'états afin de réduire significativement l'espace de recherche. Des variables aléatoires continues sont utilisées pour modéliser l'incertitude sur le temps. Un réseau bayésien, qui est généré dynamiquement, modélise les dépendances entre les variables aléatoires et estime la qualité et la probabilité de succès des plans. Un premier planificateur, ACTUP LAN nc basé sur un algorithme de recherche à chaînage avant, prend en charge des actions ayant des durées probabilistes. Ce dernier génère des plans non conditionnels qui satisfont à une contrainte sur la probabilité de succès souhaitée. Un deuxième planificateur, ACTUP LAN, fusionne des plans non conditionnels afin de construire des plans conditionnels plus efficaces. Un troisième planificateur, nommé QUANPLAN, prend également en charge l'incertitude sur les effets des actions. Afin de modéliser l'exécution simultanée d'actions aux effets indéterminés, QUANP LAN s'inspire de la mécanique quantique où des états quantiques sont des superpositions d'états classiques. Un processus décisionnel de Markov (MDP) est utilisé pour générer des plans dans un espace d'états quantiques. L'optimalité, la complétude, ainsi que les limites de ces planificateurs sont discutées. Des comparaisons avec d'autres planificateurs ciblant des classes de problèmes similaires démontrent l'efficacité des méthodes présentées. Enfin, des contributions complémentaires aux domaines des jeux et de la planification de trajectoires sont également présentées
Planning and learning under uncertainty
Automated Planning is the component of Artificial Intelligence that studies the computational process of synthesizing sets of actions whose execution achieves some given objectives. Research on Automated Planning has traditionally focused on solving theoretical problems in controlled environments. In such environments both, the current state of the environment and the outcome of actions, are completely known. The development of real planning applications during the last decade (planning fire extinction operations (Castillo et al., 2006), planning spacecraft activities (Nayak et al., 1999), planning emergency evacuation actions (Muñoz-Avila et al., 1999) has evidenced that these two assumptions are not true in many real-world problems. The planning research community is aware of this issue and during the last years, it has multiply its efforts to find new planning systems able to address these kinds of problems. All these efforts have created a new field in Automated Planning called planning under uncertainty. Nevertheless, the new systems suffer from two limitations. (1) They precise accurate action models, though the definition by hand of accurate action models is frequently very complex. (2) They present scalability problems due to the combinatorial explosion implied by the expressiveness of its action models. This thesis defines a new planning paradigm for building, in an efficient and scalable way, robust plans in domains with uncertainty though the action model is incomplete. The thesis is that, the integration of relational machine learning techniques with the planning and execution processes, allows to develop planning systems that automatically enrich their initial knowledge about the environment and therefore find more robust plans. An empirical evaluation illustrates these benefits in comparison with state-of-the-art probabilistic planners which use static actions models. -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------La Planificación Automática es la rama de la Inteligencia Artificial que estudia los procesos computacionales para la síntesis de conjuntos de acciones cuya ejecución permita alcanzar unos objetivos dados. Históricamente, la investigación en esta rama ha tratado de resolver problemas teóricos en entornos controlados en los que conocía tanto el estado actual del entorno como el resultado de ejecutar acciones en él. En la última década, el desarrollo de aplicaciones de planificación (gestión de las tareas de extinción de incendios forestales (Castillo et al., 2006), control de las actividades de la nave espacial Deep Space 1 (Nayak et al., 1999), planificación de evacuaciones de emergencia (Muñoz-Avila et al., 1999) ha evidenciado que tales supuestos no son ciertos en muchos problemas reales. Consciente de ello, la comunidad investigadora ha multiplicado sus esfuerzos para encontrar nuevos paradigmas de planificación que se ajusten mejor a este tipo de problemas. Estos esfuerzos han llevado al nacimiento de una nueva área dentro de la Planificación Automática, llamada planificación con incertidumbre. Sin embargo, los nuevos planificadores para dominios con incertidumbre aún presentan dos importantes limitaciones: (1) Necesitan modelos de acciones detallados que contemplen los posibles resultados de ejecutar cada acción. En la mayoría de problemas reales es difícil obtener modelos de este tipo. (2) Presentan fuertes problemas de escalabilidad debido a la explosión combinatoria que provoca la complejidad de los modelos de acciones que manejan. En esta Tesis se define un paradigma de planificación capaz de generar, de forma eficiente y escalable, planes robustos en dominios con incertidumbre aunque no se disponga de modelos de acciones completamente detallados. La Tesis que se defiende es que la integración de técnicas de aprendizaje automático relacional con los procesos de decisión y ejecución permite desarrollar sistemas de planificación capaces de enriquecer automáticamente su modelo de acciones con información adicional que les ayuda a encontrar planes más robustos. Los beneficios de esta integración son evaluados experimentalmente mediante una comparación con planificadores probabilísticos del estado del arte los cuales no modifican su modelo de acciones