42 research outputs found
On Solving the Rubik's Cube with Domain-Independent Planners Using Standard Representations
Rubik's Cube (RC) is a well-known and computationally challenging puzzle that
has motivated AI researchers to explore efficient alternative representations
and problem-solving methods. The ideal situation for planning here is that a
problem be solved optimally and efficiently represented in a standard notation
using a general-purpose solver and heuristics. The fastest solver today for RC
is DeepCubeA with a custom representation, and another approach is with
Scorpion planner with State-Action-Space+ (SAS+) representation. In this paper,
we present the first RC representation in the popular PDDL language so that the
domain becomes more accessible to PDDL planners, competitions, and knowledge
engineering tools, and is more human-readable. We then bridge across existing
approaches and compare performance. We find that in one comparable experiment,
DeepCubeA (trained with 12 RC actions) solves all problems with varying
complexities, albeit only 78.5% are optimal plans. For the same problem set,
Scorpion with SAS+ representation and pattern database heuristics solves 61.50%
problems optimally, while FastDownward with PDDL representation and FF
heuristic solves 56.50% problems, out of which 79.64% of the plans generated
were optimal. Our study provides valuable insights into the trade-offs between
representational choice and plan optimality that can help researchers design
future strategies for challenging domains combining general-purpose solving
methods (planning, reinforcement learning), heuristics, and representations
(standard or custom)
Egocentric Planning for Scalable Embodied Task Achievement
Embodied agents face significant challenges when tasked with performing
actions in diverse environments, particularly in generalizing across object
types and executing suitable actions to accomplish tasks. Furthermore, agents
should exhibit robustness, minimizing the execution of illegal actions. In this
work, we present Egocentric Planning, an innovative approach that combines
symbolic planning and Object-oriented POMDPs to solve tasks in complex
environments, harnessing existing models for visual perception and natural
language processing. We evaluated our approach in ALFRED, a simulated
environment designed for domestic tasks, and demonstrated its high scalability,
achieving an impressive 36.07% unseen success rate in the ALFRED benchmark and
winning the ALFRED challenge at CVPR Embodied AI workshop. Our method requires
reliable perception and the specification or learning of a symbolic description
of the preconditions and effects of the agent's actions, as well as what object
types reveal information about others. It is capable of naturally scaling to
solve new tasks beyond ALFRED, as long as they can be solved using the
available skills. This work offers a solid baseline for studying end-to-end and
hybrid methods that aim to generalize to new tasks, including recent approaches
relying on LLMs, but often struggle to scale to long sequences of actions or
produce robust plans for novel tasks
Planning as Theorem Proving with Heuristics
Planning as theorem proving in situation calculus was abandoned 50 years ago
as an impossible project. But we have developed a Theorem Proving Lifted
Heuristic (TPLH) planner that searches for a plan in a tree of situations using
the A* search algorithm. It is controlled by a delete relaxation-based domain
independent heuristic. We compare TPLH with Fast Downward (FD) and Best First
Width Search (BFWS) planners over several standard benchmarks. Since our
implementation of the heuristic function is not optimized, TPLH is slower than
FD and BFWS. But it computes shorter plans, and it explores fewer states. We
discuss previous research on planning within KR\&R and identify related
directions. Thus, we show that deductive lifted heuristic planning in situation
calculus is actually doable.Comment: Submitted for a review. Copyright (C) 2023 by Mikhail Soutchanski and
Ryan Youn
ICAPS 2012. Proceedings of the third Workshop on the International Planning Competition
22nd International Conference on Automated Planning and Scheduling. June 25-29, 2012, Atibaia, Sao Paulo (Brazil).
Proceedings of the 3rd the International Planning
CompetitionThe Academic Advising Planning Domain / Joshua T. Guerin, Josiah P. Hanna, Libby Ferland, Nicholas Mattei, and Judy Goldsmith. -- Leveraging Classical Planners through Translations / Ronen I. Brafman, Guy Shani, and Ran Taig. -- Advances in BDD Search: Filtering, Partitioning, and Bidirectionally Blind / Stefan Edelkamp, Peter Kissmann, and Ălvaro Torralba. -- A Multi-Agent Extension of PDDL3.1 / Daniel L. Kovacs. -- Mining IPC-2011 Results / Isabel Cenamor, TomĂĄs de la Rosa, and Fernando FernĂĄndez. -- How Good is the Performance of the Best Portfolio in IPC-2011? /
Sergio Nuñez, Daniel Borrajo, and Carlos Linares LĂłpez. -- âType Problem in Domain Description!â or, Outsidersâ Suggestions for PDDL Improvement / Robert P. Goldman and Peter KellerEn prens
Machine Learning for Classical Planning: Neural Network Heuristics, Online Portfolios, and State Space Topologies
State space search solves navigation tasks and many other real world problems. Heuristic search, especially greedy best-first search, is one of the most successful algorithms for state space search. We improve the state of the art in heuristic search in three directions.
In Part I, we present methods to train neural networks as powerful heuristics for a given state space. We present a universal approach to generate training data using random walks from a (partial) state. We demonstrate that our heuristics trained for a specific task are often better than heuristics trained for a whole domain. We show that the performance of all trained heuristics is highly complementary. There is no clear pattern, which trained heuristic to prefer for a specific task. In general, model-based planners still outperform planners with trained heuristics. But our approaches exceed the model-based algorithms in the Storage domain. To our knowledge, only once before in the Spanner domain, a learning-based planner exceeded the state-of-the-art model-based planners. A priori, it is unknown whether a heuristic, or in the more general case a planner, performs well on a task. Hence, we trained online portfolios to select the best planner for a task. Today, all online portfolios are based on handcrafted features. In Part II, we present new online portfolios based on neural networks, which receive the complete task as input, and not just a few handcrafted features. Additionally, our portfolios can reconsider their choices. Both extensions greatly improve the state-of-the-art of online portfolios. Finally, we show that explainable machine learning techniques, as the alternative to neural networks, are also good online portfolios. Additionally, we present methods to improve our trust in their predictions.
Even if we select the best search algorithm, we cannot solve some tasks in reasonable time. We can speed up the search if we know how it behaves in the future. In Part III, we inspect the behavior of greedy best-first search with a fixed heuristic on simple tasks of a domain to learn its behavior for any task of the same domain. Once greedy best-first search expanded a progress state, it expands only states with lower heuristic values. We learn to identify progress states and present two methods to exploit this knowledge. Building upon this, we extract the bench transition system of a task and generalize it in such a way that we can apply it to any task of the same domain. We can use this generalized bench transition system to split a task into a sequence of simpler searches.
In all three research directions, we contribute new approaches and insights to the state of the art, and we indicate interesting topics for future work
Reinforcement Learning for Planning Heuristics
Informed heuristics are essential for the success of heuristic search algorithms. But, it is difficult to develop a new heuris- tic which is informed on various tasks. Instead, we propose a framework that trains a neural network as heuristic for the tasks it is supposed to solve. We present two reinforcement learning approaches to learn heuristics for fixed state spaces and fixed goals. Our first approach uses approximate value iteration, our second ap- proach uses searches to generate training data. We show that in some domains our approaches outperform previous work, and we point out potentials for future improvements
Machine learning for classical planning : neural network heuristics, online portfolios, and state space topologies
State space search solves navigation tasks and many other real world problems. Heuristic search, especially greedy best-first search, is one of the most successful algorithms for state space search. We improve the state of the art in heuristic search in three directions. In Part I, we present methods to train neural networks as powerful heuristics for a given state space. We present a universal approach to generate training data using random walks from a (partial) state. We demonstrate that our heuristics trained for a specific task are often better than heuristics trained for a whole domain. We show that the performance of all trained heuristics is highly complementary. There is no clear pattern, which trained heuristic to prefer for a specific task. In general, model-based planners still outperform planners with trained heuristics. But our approaches exceed the model-based algorithms in the Storage domain. To our knowledge, only once before in the Spanner domain, a learning-based planner exceeded the state-of-the-art model-based planners. A priori, it is unknown whether a heuristic, or in the more general case a planner, performs well on a task. Hence, we trained online portfolios to select the best planner for a task. Today, all online portfolios are based on handcrafted features. In Part II, we present new online portfolios based on neural networks, which receive the complete task as input, and not just a few handcrafted features. Additionally, our portfolios can reconsider their choices. Both extensions greatly improve the state-of-the-art of online portfolios. Finally, we show that explainable machine learning techniques, as the alternative to neural networks, are also good online portfolios. Additionally, we present methods to improve our trust in their predictions. Even if we select the best search algorithm, we cannot solve some tasks in reasonable time. We can speed up the search if we know how it behaves in the future. In Part III, we inspect the behavior of greedy best-first search with a fixed heuristic on simple tasks of a domain to learn its behavior for any task of the same domain. Once greedy best- first search expanded a progress state, it expands only states with lower heuristic values. We learn to identify progress states and present two methods to exploit this knowledge. Building upon this, we extract the bench transition system of a task and generalize it in such a way that we can apply it to any task of the same domain. We can use this generalized bench transition system to split a task into a sequence of simpler searches. In all three research directions, we contribute new approaches and insights to the state of the art, and we indicate interesting topics for future work.Viele Alltagsprobleme können mit Hilfe der Zustandsraumsuche gelöst werden. Heuristische Suche, insbesondere die gierige Bestensuche, ist einer der erfolgreichsten Algorithmen fĂŒr die Zustandsraumsuche. Wir verbessern den aktuellen Stand der Wissenschaft bezĂŒglich heuristischer Suche auf drei Arten. Eine der wichtigsten Komponenten der heuristischen Suche ist die Heuristik. Mit einer guten Heuristik findet die Suche schnell eine Lösung. Eine gute Heuristik fĂŒr ein Problem zu modellieren ist mĂŒhsam. In Teil I prĂ€sentieren wir Methoden, um automatisiert gute Heuristiken fĂŒr ein Problem zu lernen. HierfĂŒr generieren wird die Trainingsdaten mittels Zufallsbewegungen ausgehend von (Teil-) ZustĂ€nden des Problems. Wir zeigen, dass die Heuristiken, die wir fĂŒr einen einzigen Zustandsraum trainieren, oft besser sind als Heuristiken, die fĂŒr eine Problemklasse trainiert wurden. Weiterhin zeigen wir, dass die QualitĂ€t aller trainierten Heuristiken je nach Problemklasse stark variiert, keine Heuristik eine andere dominiert, und es nicht vorher erkennbar ist, ob eine trainierte Heuristik gut funktioniert. Wir stellen fest, dass in fast allen getesteten Problemklassen die modellbasierte Suchalgorithmen den trainierten Heuristiken ĂŒberlegen sind. Lediglich in der Storage Problemklasse sind unsere Heuristiken ĂŒberlegen. Oft ist es unklar, welche Heuristik oder Suchalgorithmus man fĂŒr ein Problem nutzen sollte. Daher trainieren wir online Portfolios, die fĂŒr ein gegebenes Problem den besten Algorithmus vorherzusagen. Die Eingabe fĂŒr das online Portfolio sind bisher immer von Menschen ausgewĂ€hlte Eigenschaften des Problems. In Teil II prĂ€sentieren wir neue online Portfolios, die das gesamte Problem als Eingabe bekommen. DarĂŒber hinaus können unsere online Portfolios ihre Entscheidung einmal korrigieren. Beide Ănderungen verbessern die QualitĂ€t von online Portfolios erheblich. Weiterhin zeigen wir, dass wir auch gute online Portfolios mit erklĂ€rbaren Techniken des maschinellen Lernens trainieren können. Selbst wenn wir den besten Algorithmus fĂŒr ein Problem auswĂ€hlen, kann es sein, dass das Problem zu schwierig ist, um in akzeptabler Zeit gelöst zu werden. In Teil III zeigen wir, wie wir von dem Verhalten einer gierigen Bestensuche auf einfachen Problemen ihr Verhalten auf schwierigeren Problemen der gleichen Problemklasse vorhersagen können. Dieses Wissen nutzen wir, um die Suche zu verbessern. Zuerst zeigen wir, wie man FortschrittszustĂ€nde erkennt. Immer wenn gierige Bestensuche einen Fortschrittszustand expandiert, wissen wir, dass es nie wieder einen Zustand mit gleichem oder höheren heuristischen Wert expandieren wird.Wir prĂ€sentieren zwei Methoden, die diesesWissen verwenden. Aufbauend auf dieser Arbeit lernen wir von einem Problem, wie man jegliches Problem der gleichen Problemklasse in eine Reihe von einfacheren Suchen aufteilen kann
Extending classical planning with state constraints: Heuristics and search for optimal planning
We present a principled way of extending a classical AI planning formalism with systems of state constraints, which relate - sometimes determine - the values of variables in each state traversed by the plan. This extension occupies an attractive middle ground between expressivity and complexity. It enables modelling a new range of problems, as well as formulating more efficient models of classical planning problems. An example of the former is planning-based control of networked physical systems - power networks, for example - in which a local, discrete control action can have global effects on continuous quantities, such as altering flows across the entire network. At the same time, our extension remains decidable as long as the satisfiability of sets of state constraints is decidable, including in the presence of numeric state variables, and we demonstrate that effective techniques for cost-optimal planning known in the classical setting - in particular, relaxation-based admissible heuristics - can be adapted to the extended formalism. In this paper, we apply our approach to constraints in the form of linear or non-linear equations over numeric state variables, but the approach is independent of the type of state constraints, as long as there exists a procedure that decides their consistency. The planner and the constraint solver interact through a well-defined, narrow interface, in which the solver requires no specialisation to the planning contextThis work was supported by ARC project DP140104219, âRobust AI Planning for Hybrid Systemsâ, and in part by ARO grant W911NF1210471 and ONR grant N000141210430
Integrating Planning and Learning for Agents Acting in Unknown Environments
An Artificial Intelligence (AI) agent acting in an environment can perceive the environment through sensors and execute actions through actuators. Symbolic planning provides an agent with decision-making capabilities about the actions
to execute for accomplishing tasks in the environment. For applying symbolic planning, an agent needs to know its symbolic state, and an abstract model of the environment dynamics. However, in the real world, an agent has low-level
perceptions of the environment (e.g. its position given by a GPS sensor), rather than symbolic observations representing its current state. Furthermore, in many real-world scenarios, it is not feasible to provide an agent with a complete and correct model of the environment, e.g., when the environment is unknown a priori. The gap between the high-level representations, suitable for symbolic planning, and the low-level sensors and actuators, available in a real-world agent, can be bridged by integrating learning, planning, and acting. Firstly, an agent has to map its continuous perceptions into its current symbolic state, e.g. by detecting the set of objects and their properties from an RGB image provided by an onboard camera. Afterward, the agent has to build a model of the environment by interacting with the environment and observing the effects of the executed actions. Finally, the agent has to plan on the learned environment model and execute the symbolic actions through its actuators. We propose an architecture that integrates learning, planning, and acting. Our approach combines data-driven learning methods for building an environment model online with symbolic planning techniques for reasoning on the learned model. In particular, we focus on learning the environment model, from either continuous or symbolic observations, assuming the agent perceptual input is the complete and correct state of the environment, and the agent is able to execute symbolic actions in the environment. Afterward, we assume a partial model of the environment and the capability of mapping perceptions into noisy and incomplete symbolic states are given, and the agent has to exploit the environment model and its perception capabilities to perform tasks in unknown and partially observable environments. Then, we tackle the problem of online learning the mapping between continuous perceptions and symbolic states, assuming the agent is given a partial model of the environment and is able to execute symbolic actions in the real world. In our approach, we take advantage of learning methods for overcoming some of the simplifying assumptions of symbolic planning, such as the full observability of the environment, or the need of having a correct environment model. Similarly, we take advantage of symbolic planning techniques to enable an agent to autonomously gather relevant information online, which is necessary for data-driven learning methods. We experimentally show the effectiveness of our approach in simulated and complex environments, outperforming state-of-the-art methods. Finally, we empirically demonstrate the applicability of our approach in real environments, by conducting experiments on a real robot