    On Solving the Rubik's Cube with Domain-Independent Planners Using Standard Representations

    Rubik's Cube (RC) is a well-known and computationally challenging puzzle that has motivated AI researchers to explore efficient alternative representations and problem-solving methods. The ideal situation for planning here is that a problem be solved optimally and efficiently represented in a standard notation using a general-purpose solver and heuristics. The fastest solver today for RC is DeepCubeA with a custom representation, and another approach is with Scorpion planner with State-Action-Space+ (SAS+) representation. In this paper, we present the first RC representation in the popular PDDL language so that the domain becomes more accessible to PDDL planners, competitions, and knowledge engineering tools, and is more human-readable. We then bridge across existing approaches and compare performance. We find that in one comparable experiment, DeepCubeA (trained with 12 RC actions) solves all problems with varying complexities, albeit only 78.5% are optimal plans. For the same problem set, Scorpion with SAS+ representation and pattern database heuristics solves 61.50% problems optimally, while FastDownward with PDDL representation and FF heuristic solves 56.50% problems, out of which 79.64% of the plans generated were optimal. Our study provides valuable insights into the trade-offs between representational choice and plan optimality that can help researchers design future strategies for challenging domains combining general-purpose solving methods (planning, reinforcement learning), heuristics, and representations (standard or custom)

    Egocentric Planning for Scalable Embodied Task Achievement

    Embodied agents face significant challenges when tasked with performing actions in diverse environments, particularly in generalizing across object types and executing suitable actions to accomplish tasks. Furthermore, agents should exhibit robustness, minimizing the execution of illegal actions. In this work, we present Egocentric Planning, an innovative approach that combines symbolic planning and Object-oriented POMDPs to solve tasks in complex environments, harnessing existing models for visual perception and natural language processing. We evaluated our approach in ALFRED, a simulated environment designed for domestic tasks, and demonstrated its high scalability, achieving an impressive 36.07% unseen success rate in the ALFRED benchmark and winning the ALFRED challenge at CVPR Embodied AI workshop. Our method requires reliable perception and the specification or learning of a symbolic description of the preconditions and effects of the agent's actions, as well as what object types reveal information about others. It is capable of naturally scaling to solve new tasks beyond ALFRED, as long as they can be solved using the available skills. This work offers a solid baseline for studying end-to-end and hybrid methods that aim to generalize to new tasks, including recent approaches relying on LLMs, but often struggle to scale to long sequences of actions or produce robust plans for novel tasks

    Planning as Theorem Proving with Heuristics

    Planning as theorem proving in situation calculus was abandoned 50 years ago as an impossible project. But we have developed a Theorem Proving Lifted Heuristic (TPLH) planner that searches for a plan in a tree of situations using the A* search algorithm. It is controlled by a delete relaxation-based domain independent heuristic. We compare TPLH with Fast Downward (FD) and Best First Width Search (BFWS) planners over several standard benchmarks. Since our implementation of the heuristic function is not optimized, TPLH is slower than FD and BFWS. But it computes shorter plans, and it explores fewer states. We discuss previous research on planning within KR\&R and identify related directions. Thus, we show that deductive lifted heuristic planning in situation calculus is actually doable.Comment: Submitted for a review. Copyright (C) 2023 by Mikhail Soutchanski and Ryan Youn

    ICAPS 2012. Proceedings of the third Workshop on the International Planning Competition

    22nd International Conference on Automated Planning and Scheduling. June 25-29, 2012, Atibaia, Sao Paulo (Brazil). Proceedings of the 3rd the International Planning CompetitionThe Academic Advising Planning Domain / Joshua T. Guerin, Josiah P. Hanna, Libby Ferland, Nicholas Mattei, and Judy Goldsmith. -- Leveraging Classical Planners through Translations / Ronen I. Brafman, Guy Shani, and Ran Taig. -- Advances in BDD Search: Filtering, Partitioning, and Bidirectionally Blind / Stefan Edelkamp, Peter Kissmann, and Álvaro Torralba. -- A Multi-Agent Extension of PDDL3.1 / Daniel L. Kovacs. -- Mining IPC-2011 Results / Isabel Cenamor, TomĂĄs de la Rosa, and Fernando FernĂĄndez. -- How Good is the Performance of the Best Portfolio in IPC-2011? / Sergio Nuñez, Daniel Borrajo, and Carlos Linares LĂłpez. -- “Type Problem in Domain Description!” or, Outsiders’ Suggestions for PDDL Improvement / Robert P. Goldman and Peter KellerEn prens

    Machine Learning for Classical Planning: Neural Network Heuristics, Online Portfolios, and State Space Topologies

    State space search solves navigation tasks and many other real world problems. Heuristic search, especially greedy best-first search, is one of the most successful algorithms for state space search. We improve the state of the art in heuristic search in three directions. In Part I, we present methods to train neural networks as powerful heuristics for a given state space. We present a universal approach to generate training data using random walks from a (partial) state. We demonstrate that our heuristics trained for a specific task are often better than heuristics trained for a whole domain. We show that the performance of all trained heuristics is highly complementary. There is no clear pattern, which trained heuristic to prefer for a specific task. In general, model-based planners still outperform planners with trained heuristics. But our approaches exceed the model-based algorithms in the Storage domain. To our knowledge, only once before in the Spanner domain, a learning-based planner exceeded the state-of-the-art model-based planners. A priori, it is unknown whether a heuristic, or in the more general case a planner, performs well on a task. Hence, we trained online portfolios to select the best planner for a task. Today, all online portfolios are based on handcrafted features. In Part II, we present new online portfolios based on neural networks, which receive the complete task as input, and not just a few handcrafted features. Additionally, our portfolios can reconsider their choices. Both extensions greatly improve the state-of-the-art of online portfolios. Finally, we show that explainable machine learning techniques, as the alternative to neural networks, are also good online portfolios. Additionally, we present methods to improve our trust in their predictions. Even if we select the best search algorithm, we cannot solve some tasks in reasonable time. We can speed up the search if we know how it behaves in the future. In Part III, we inspect the behavior of greedy best-first search with a fixed heuristic on simple tasks of a domain to learn its behavior for any task of the same domain. Once greedy best-first search expanded a progress state, it expands only states with lower heuristic values. We learn to identify progress states and present two methods to exploit this knowledge. Building upon this, we extract the bench transition system of a task and generalize it in such a way that we can apply it to any task of the same domain. We can use this generalized bench transition system to split a task into a sequence of simpler searches. In all three research directions, we contribute new approaches and insights to the state of the art, and we indicate interesting topics for future work

    Reinforcement Learning for Planning Heuristics

    Informed heuristics are essential for the success of heuristic search algorithms. But, it is difficult to develop a new heuris- tic which is informed on various tasks. Instead, we propose a framework that trains a neural network as heuristic for the tasks it is supposed to solve. We present two reinforcement learning approaches to learn heuristics for fixed state spaces and fixed goals. Our first approach uses approximate value iteration, our second ap- proach uses searches to generate training data. We show that in some domains our approaches outperform previous work, and we point out potentials for future improvements

    Machine learning for classical planning : neural network heuristics, online portfolios, and state space topologies

    State space search solves navigation tasks and many other real world problems. Heuristic search, especially greedy best-first search, is one of the most successful algorithms for state space search. We improve the state of the art in heuristic search in three directions. In Part I, we present methods to train neural networks as powerful heuristics for a given state space. We present a universal approach to generate training data using random walks from a (partial) state. We demonstrate that our heuristics trained for a specific task are often better than heuristics trained for a whole domain. We show that the performance of all trained heuristics is highly complementary. There is no clear pattern, which trained heuristic to prefer for a specific task. In general, model-based planners still outperform planners with trained heuristics. But our approaches exceed the model-based algorithms in the Storage domain. To our knowledge, only once before in the Spanner domain, a learning-based planner exceeded the state-of-the-art model-based planners. A priori, it is unknown whether a heuristic, or in the more general case a planner, performs well on a task. Hence, we trained online portfolios to select the best planner for a task. Today, all online portfolios are based on handcrafted features. In Part II, we present new online portfolios based on neural networks, which receive the complete task as input, and not just a few handcrafted features. Additionally, our portfolios can reconsider their choices. Both extensions greatly improve the state-of-the-art of online portfolios. Finally, we show that explainable machine learning techniques, as the alternative to neural networks, are also good online portfolios. Additionally, we present methods to improve our trust in their predictions. Even if we select the best search algorithm, we cannot solve some tasks in reasonable time. We can speed up the search if we know how it behaves in the future. In Part III, we inspect the behavior of greedy best-first search with a fixed heuristic on simple tasks of a domain to learn its behavior for any task of the same domain. Once greedy best- first search expanded a progress state, it expands only states with lower heuristic values. We learn to identify progress states and present two methods to exploit this knowledge. Building upon this, we extract the bench transition system of a task and generalize it in such a way that we can apply it to any task of the same domain. We can use this generalized bench transition system to split a task into a sequence of simpler searches. In all three research directions, we contribute new approaches and insights to the state of the art, and we indicate interesting topics for future work.Viele Alltagsprobleme können mit Hilfe der Zustandsraumsuche gelöst werden. Heuristische Suche, insbesondere die gierige Bestensuche, ist einer der erfolgreichsten Algorithmen fĂŒr die Zustandsraumsuche. Wir verbessern den aktuellen Stand der Wissenschaft bezĂŒglich heuristischer Suche auf drei Arten. Eine der wichtigsten Komponenten der heuristischen Suche ist die Heuristik. Mit einer guten Heuristik findet die Suche schnell eine Lösung. Eine gute Heuristik fĂŒr ein Problem zu modellieren ist mĂŒhsam. In Teil I prĂ€sentieren wir Methoden, um automatisiert gute Heuristiken fĂŒr ein Problem zu lernen. HierfĂŒr generieren wird die Trainingsdaten mittels Zufallsbewegungen ausgehend von (Teil-) ZustĂ€nden des Problems. Wir zeigen, dass die Heuristiken, die wir fĂŒr einen einzigen Zustandsraum trainieren, oft besser sind als Heuristiken, die fĂŒr eine Problemklasse trainiert wurden. Weiterhin zeigen wir, dass die QualitĂ€t aller trainierten Heuristiken je nach Problemklasse stark variiert, keine Heuristik eine andere dominiert, und es nicht vorher erkennbar ist, ob eine trainierte Heuristik gut funktioniert. Wir stellen fest, dass in fast allen getesteten Problemklassen die modellbasierte Suchalgorithmen den trainierten Heuristiken ĂŒberlegen sind. Lediglich in der Storage Problemklasse sind unsere Heuristiken ĂŒberlegen. Oft ist es unklar, welche Heuristik oder Suchalgorithmus man fĂŒr ein Problem nutzen sollte. Daher trainieren wir online Portfolios, die fĂŒr ein gegebenes Problem den besten Algorithmus vorherzusagen. Die Eingabe fĂŒr das online Portfolio sind bisher immer von Menschen ausgewĂ€hlte Eigenschaften des Problems. In Teil II prĂ€sentieren wir neue online Portfolios, die das gesamte Problem als Eingabe bekommen. DarĂŒber hinaus können unsere online Portfolios ihre Entscheidung einmal korrigieren. Beide Änderungen verbessern die QualitĂ€t von online Portfolios erheblich. Weiterhin zeigen wir, dass wir auch gute online Portfolios mit erklĂ€rbaren Techniken des maschinellen Lernens trainieren können. Selbst wenn wir den besten Algorithmus fĂŒr ein Problem auswĂ€hlen, kann es sein, dass das Problem zu schwierig ist, um in akzeptabler Zeit gelöst zu werden. In Teil III zeigen wir, wie wir von dem Verhalten einer gierigen Bestensuche auf einfachen Problemen ihr Verhalten auf schwierigeren Problemen der gleichen Problemklasse vorhersagen können. Dieses Wissen nutzen wir, um die Suche zu verbessern. Zuerst zeigen wir, wie man FortschrittszustĂ€nde erkennt. Immer wenn gierige Bestensuche einen Fortschrittszustand expandiert, wissen wir, dass es nie wieder einen Zustand mit gleichem oder höheren heuristischen Wert expandieren wird.Wir prĂ€sentieren zwei Methoden, die diesesWissen verwenden. Aufbauend auf dieser Arbeit lernen wir von einem Problem, wie man jegliches Problem der gleichen Problemklasse in eine Reihe von einfacheren Suchen aufteilen kann

    Extending classical planning with state constraints: Heuristics and search for optimal planning

    We present a principled way of extending a classical AI planning formalism with systems of state constraints, which relate - sometimes determine - the values of variables in each state traversed by the plan. This extension occupies an attractive middle ground between expressivity and complexity. It enables modelling a new range of problems, as well as formulating more efficient models of classical planning problems. An example of the former is planning-based control of networked physical systems - power networks, for example - in which a local, discrete control action can have global effects on continuous quantities, such as altering flows across the entire network. At the same time, our extension remains decidable as long as the satisfiability of sets of state constraints is decidable, including in the presence of numeric state variables, and we demonstrate that effective techniques for cost-optimal planning known in the classical setting - in particular, relaxation-based admissible heuristics - can be adapted to the extended formalism. In this paper, we apply our approach to constraints in the form of linear or non-linear equations over numeric state variables, but the approach is independent of the type of state constraints, as long as there exists a procedure that decides their consistency. The planner and the constraint solver interact through a well-defined, narrow interface, in which the solver requires no specialisation to the planning contextThis work was supported by ARC project DP140104219, “Robust AI Planning for Hybrid Systems”, and in part by ARO grant W911NF1210471 and ONR grant N000141210430

    Integrating Planning and Learning for Agents Acting in Unknown Environments

    An Artificial Intelligence (AI) agent acting in an environment can perceive the environment through sensors and execute actions through actuators. Symbolic planning provides an agent with decision-making capabilities about the actions to execute for accomplishing tasks in the environment. For applying symbolic planning, an agent needs to know its symbolic state, and an abstract model of the environment dynamics. However, in the real world, an agent has low-level perceptions of the environment (e.g. its position given by a GPS sensor), rather than symbolic observations representing its current state. Furthermore, in many real-world scenarios, it is not feasible to provide an agent with a complete and correct model of the environment, e.g., when the environment is unknown a priori. The gap between the high-level representations, suitable for symbolic planning, and the low-level sensors and actuators, available in a real-world agent, can be bridged by integrating learning, planning, and acting. Firstly, an agent has to map its continuous perceptions into its current symbolic state, e.g. by detecting the set of objects and their properties from an RGB image provided by an onboard camera. Afterward, the agent has to build a model of the environment by interacting with the environment and observing the effects of the executed actions. Finally, the agent has to plan on the learned environment model and execute the symbolic actions through its actuators. We propose an architecture that integrates learning, planning, and acting. Our approach combines data-driven learning methods for building an environment model online with symbolic planning techniques for reasoning on the learned model. In particular, we focus on learning the environment model, from either continuous or symbolic observations, assuming the agent perceptual input is the complete and correct state of the environment, and the agent is able to execute symbolic actions in the environment. Afterward, we assume a partial model of the environment and the capability of mapping perceptions into noisy and incomplete symbolic states are given, and the agent has to exploit the environment model and its perception capabilities to perform tasks in unknown and partially observable environments. Then, we tackle the problem of online learning the mapping between continuous perceptions and symbolic states, assuming the agent is given a partial model of the environment and is able to execute symbolic actions in the real world. In our approach, we take advantage of learning methods for overcoming some of the simplifying assumptions of symbolic planning, such as the full observability of the environment, or the need of having a correct environment model. Similarly, we take advantage of symbolic planning techniques to enable an agent to autonomously gather relevant information online, which is necessary for data-driven learning methods. We experimentally show the effectiveness of our approach in simulated and complex environments, outperforming state-of-the-art methods. Finally, we empirically demonstrate the applicability of our approach in real environments, by conducting experiments on a real robot