22 research outputs found

    Policy-Guided Heuristic Search with Guarantees

    Full text link
    The use of a policy and a heuristic function for guiding search can be quite effective in adversarial problems, as demonstrated by AlphaGo and its successors, which are based on the PUCT search algorithm. While PUCT can also be used to solve single-agent deterministic problems, it lacks guarantees on its search effort and it can be computationally inefficient in practice. Combining the A* algorithm with a learned heuristic function tends to work better in these domains, but A* and its variants do not use a policy. Moreover, the purpose of using A* is to find solutions of minimum cost, while we seek instead to minimize the search loss (e.g., the number of search steps). LevinTS is guided by a policy and provides guarantees on the number of search steps that relate to the quality of the policy, but it does not make use of a heuristic function. In this work we introduce Policy-guided Heuristic Search (PHS), a novel search algorithm that uses both a heuristic function and a policy and has theoretical guarantees on the search loss that relates to both the quality of the heuristic and of the policy. We show empirically on the sliding-tile puzzle, Sokoban, and a puzzle from the commercial game `The Witness' that PHS enables the rapid learning of both a policy and a heuristic function and compares favorably with A*, Weighted A*, Greedy Best-First Search, LevinTS, and PUCT in terms of number of problems solved and search time in all three domains tested

    最良優先探索のための探索非局在化手法

    Get PDF
    学位の種別: 課程博士審査委員会委員 : (主査)東京大学教授 Fukunaga Alex, 東京大学教授 山口 和紀, 東京大学准教授 田中 哲朗, 東京大学准教授 金子 知適, 東京大学准教授 森畑 明昌University of Tokyo(東京大学

    Optimal Planning with State Constraints

    Get PDF
    In the classical planning model, state variables are assigned values in the initial state and remain unchanged unless explicitly affected by action effects. However, some properties of states are more naturally modelled not as direct effects of actions but instead as derived, in each state, from the primary variables via a set of rules. We refer to those rules as state constraints. The two types of state constraints that will be discussed here are numeric state constraints and logical rules that we will refer to as axioms. When using state constraints we make a distinction between primary variables, whose values are directly affected by action effects, and secondary variables, whose values are determined by state constraints. While primary variables have finite and discrete domains, as in classical planning, there is no such requirement for secondary variables. For example, using numeric state constraints allows us to have secondary variables whose values are real numbers. We show that state constraints are a construct that lets us combine classical planning methods with specialised solvers developed for other types of problems. For example, introducing numeric state constraints enables us to apply planning techniques in domains involving interconnected physical systems, such as power networks. To solve these types of problems optimally, we adapt commonly used methods from optimal classical planning, namely state-space search guided by admissible heuristics. In heuristics based on monotonic relaxation, the idea is that in a relaxed state each variable assumes a set of values instead of just a single value. With state constraints, the challenge becomes to evaluate the conditions, such as goals and action preconditions, that involve secondary variables. We employ consistency checking tools to evaluate whether these conditions are satisfied in the relaxed state. In our work with numerical constraints we use linear programming, while with axioms we use answer set programming and three value semantics. This allows us to build a relaxed planning graph and compute constraint-aware version of heuristics based on monotonic relaxation. We also adapt pattern database heuristics. We notice that an abstract state can be thought of as a state in the monotonic relaxation in which the variables in the pattern hold only one value, while the variables not in the pattern simultaneously hold all the values in their domains. This means that we can apply the same technique for evaluating conditions on secondary variables as we did for the monotonic relaxation and build pattern databases similarly as it is done in classical planning. To make better use of our heuristics, we modify the A* algorithm by combining two techniques that were previously used independently – partial expansion and preferred operators. Our modified algorithm, which we call PrefPEA, is most beneficial in cases where heuristic is expensive to compute, but accurate, and states have many successors

    Solving the Sokoban problem

    Get PDF
    Sokoban is a game with simple rules, but finding solutions is a hard task for both people and computers. Sokoban is a NP-hard problem, which means that we probably cannot find every optimal solution in polynomial time. Instead, we must check all possible states in order to find the optimal solution. Thus, search algorithms are used for solving Sokoban. If we want to find a solution in a reasonable time, we need to find and implement good heuristics. Of course, because Sokoban is NP-hard, even the best current algorithms cannot find solutions to all Sokoban puzzles. The theoritical part of the thesis is analysis of the Sokoban problem and NP-hard problems. Practical part consists of description of our algorithm. We also tested our algorithm and compared it to a well-known algorithm JSoko

    Adaptive search techniques in AI planning and heuristic search

    Get PDF
    State-space search is a common approach to solve problems appearing in artificial intelligence and other subfields of computer science. In such problems, an agent must find a sequence of actions leading from an initial state to a goal state. However, the state spaces of practical applications are often too large to explore exhaustively. Hence, heuristic functions that estimate the distance to a goal state (such as straight-line distance for navigation tasks) are used to guide the search more effectively. Heuristic search is typically viewed as a static process. The heuristic function is assumed to be unchanged throughout the search, and its resulting values are directly used for guidance without applying any further reasoning to them. Yet critical aspects of the task may only be discovered during the search, e.g., regions of the state space where the heuristic does not yield reliable values. Our work here aims to make this process more dynamic, allowing the search to adapt to such observations. One form of adaptation that we consider is online refinement of the heuristic function. We design search algorithms that detect weaknesses in the heuristic, and address them with targeted refinement operations. If the heuristic converges to perfect estimates, this results in a secondary method of progress, causing search algorithms that are otherwise incomplete to eventually find a solution. We also consider settings that inherently require adaptation: In online replanning, a plan that is being executed must be amended for changes in the environment. Similarly, in real-time search, an agent must act under strict time constraints with limited information. The search algorithms we introduce in this work share a common pattern of online adaptation, allowing them to effectively react to challenges encountered during the search. We evaluate our contributions on a wide range of standard benchmarks. Our results show that the flexibility of these algorithms makes them more robust than traditional approaches, and they often yield substantial improvements over current state-of-the-art planners.Die Zustandsraumsuche ist ein oft verwendeter Ansatz um verschiedene Probleme zu lösen, die in der Künstlichen Intelligenz und anderen Bereichen der Informatik auftreten. Dabei muss ein Akteur eine Folge von Aktionen finden, die einen Pfad von einem Startzustand zu einem Zielzustand bilden. Die Zustandsräume von praktischen Anwendungen sind häufig zu groß um sie vollständig zu durchsuchen. Aus diesem Grund leitet man die Suche mit Heuristiken, die die Distanz zu einem Zielzustand abschätzen; zum Beispiel lässt sich die Luftliniendistanz als Heuristik für Navigationsprobleme einsetzen. Heuristische Suche wird typischerweise als statischer Prozess angesehen. Man nimmt an, dass die Heuristik während der Suche eine unveränderte Funktion ist, und die resultierenden Werte werden direkt zur Leitung der Suche benutzt ohne weitere Logik darauf anzuwenden. Jedoch könnten kritische Aspekte des Problems erst im Laufe der Suche erkannt werden, wie zum Beispiel Bereiche des Zustandsraums in denen die Heuristik keine verlässlichen Abschätzungen liefert. In dieser Arbeit wird der Suchprozess dynamischer gestaltet und der Suche ermöglicht sich solchen Beobachtungen anzupassen. Eine Art dieser Anpassung ist die Onlineverbesserung der Heuristik. Es werden Suchalgorithmen entwickelt, die Schwächen in der Heuristik erkennen und mit gezielten Verbesserungsoperationen beheben. Wenn die Heuristik zu perfekten Werten konvergiert ergibt sich daraus eine zusätzliche Form von Fortschritt, wodurch auch Suchalgorithmen, die sonst unvollständig sind, garantiert irgendwann eine Lösung finden werden. Es werden auch Szenarien betrachtet, die schon von sich aus Anpassung erfordern: In der Onlineumplanung muss ein Plan, der gerade ausgeführt wird, auf Änderungen in der Umgebung angepasst werden. Ähnlich dazu muss sich ein Akteur in der Echtzeitsuche unter strengen Zeitauflagen und mit eingeschränkten Informationen bewegen. Die Suchalgorithmen, die in dieser Arbeit eingeführt werden, folgen einem gemeinsamen Muster von Onlineanpassung, was ihnen ermöglicht effektiv auf Herausforderungen zu reagieren die im Verlauf der Suche aufkommen. Diese Ansätze werden auf einer breiten Reihe von Benchmarks ausgewertet. Die Ergebnisse zeigen, dass die Flexibilität dieser Algorithmen zu erhöhter Zuverlässigkeit im Vergleich zu traditionellen Ansätzen führt, und es werden oft deutliche Verbesserungen gegenüber modernen Planungssystemen erzielt.DFG grant 389792660 as part of TRR 248 – CPEC (see https://perspicuous-computing.science), and DFG grant HO 2169/5-1, "Critically Constrained Planning via Partial Delete Relaxation

    Conflict-driven learning in AI planning state-space search

    Get PDF
    Many combinatorial computation problems in computer science can be cast as a reachability problem in an implicitly described, potentially huge, graph: the state space. State-space search is a versatile and widespread method to solve such reachability problems, but it requires some form of guidance to prevent exploring that combinatorial space exhaustively. Conflict-driven learning is an indispensable search ingredient for solving constraint satisfaction problems (most prominently, Boolean satisfiability). It guides search towards solutions by identifying conflicts during the search, i.e., search branches not leading to any solution, learning from them knowledge to avoid similar conflicts in the remainder of the search. This thesis adapts the conflict-driven learning methodology to more general classes of reachability problems. Specifically, our work is placed in AI planning. We consider goal-reachability objectives in classical planning and in planning under uncertainty. The canonical form of "conflicts" in this context are dead-end states, i.e., states from which the desired goal property cannot be reached. We pioneer methods for learning sound and generalizable dead-end knowledge from conflicts encountered during forward state-space search. This embraces the following core contributions: When acting under uncertainty, the presence of dead-end states may make it impossible to satisfy the goal property with absolute certainty. The natural planning objective then is MaxProb, maximizing the probability of reaching the goal. However, algorithms for MaxProb probabilistic planning are severely underexplored. We close this gap by developing a large design space of probabilistic state-space search methods, contributing new search algorithms, admissible state-space reduction techniques, and goal-probability bounds suitable for heuristic state-space search. We systematically explore this design space through an extensive empirical evaluation. The key to our conflict-driven learning algorithm adaptation are unsolvability detectors, i.e., goal-reachability overapproximations. We design three complementary families of such unsolvability detectors, building upon known techniques: critical-path heuristics, linear-programming-based heuristics, and dead-end traps. We develop search methods to identify conflicts in deterministic and probabilistic state spaces, and we develop suitable refinement methods for the different unsolvability detectors so to recognize these states. Arranged in a depth-first search, our techniques approach the elegance of conflict-driven learning in constraint satisfaction, featuring the ability to learn to refute search subtrees, and intelligent backjumping to the root cause of a conflict. We provide a comprehensive experimental evaluation, demonstrating that the proposed techniques yield state-of-the-art performance for finding plans for solvable classical planning tasks, proving classical planning tasks unsolvable, and solving MaxProb in probabilistic planning, on benchmarks where dead-end states abound.Viele kombinatorisch komplexe Berechnungsprobleme in der Informatik lassen sich als Erreichbarkeitsprobleme in einem implizit dargestellten, potenziell riesigen, Graphen - dem Zustandsraum - verstehen. Die Zustandsraumsuche ist eine weit verbreitete Methode, um solche Erreichbarkeitsprobleme zu lösen. Die Effizienz dieser Methode hängt aber maßgeblich von der Verwendung strikter Suchkontrollmechanismen ab. Das konfliktgesteuerte Lernen ist eine essenzielle Suchkomponente für das Lösen von Constraint-Satisfaction-Problemen (wie dem Erfüllbarkeitsproblem der Aussagenlogik), welches von Konflikten, also Fehlern in der Suche, neue Kontrollregeln lernt, die ähnliche Konflikte zukünftig vermeiden. In dieser Arbeit erweitern wir die zugrundeliegende Methodik auf Zielerreichbarkeitsfragen, wie sie im klassischen und probabilistischen Planen, einem Teilbereich der Künstlichen Intelligenz, auftauchen. Die kanonische Form von „Konflikten“ in diesem Kontext sind sog. Sackgassen, Zustände, von denen aus die Zielbedingung nicht erreicht werden kann. Wir präsentieren Methoden, die es ermöglichen, während der Zustandsraumsuche von solchen Konflikten korrektes und verallgemeinerbares Wissen über Sackgassen zu erlernen. Unsere Arbeit umfasst folgende Beiträge: Wenn der Effekt des Handelns mit Unsicherheiten behaftet ist, dann kann die Existenz von Sackgassen dazu führen, dass die Zielbedingung nicht unter allen Umständen erfüllt werden kann. Die naheliegendste Planungsbedingung in diesem Fall ist MaxProb, das Maximieren der Wahrscheinlichkeit, dass die Zielbedingung erreicht wird. Planungsalgorithmen für MaxProb sind jedoch wenig erforscht. Um diese Lücke zu schließen, erstellen wir einen umfangreichen Bausatz für Suchmethoden in probabilistischen Zustandsräumen, und entwickeln dabei neue Suchalgorithmen, Zustandsraumreduktionsmethoden, und Abschätzungen der Zielerreichbarkeitswahrscheinlichkeit, wie sie für heuristische Suchalgorithmen gebraucht werden. Wir explorieren den resultierenden Gestaltungsraum systematisch in einer breit angelegten empirischen Studie. Die Grundlage unserer Adaption des konfliktgesteuerten Lernens bilden Unerreichbarkeitsdetektoren. Wir konzipieren drei Familien solcher Detektoren basierend auf bereits bekannten Techniken: Kritische-Pfad Heuristiken, Heuristiken basierend auf linearer Optimierung, und Sackgassen-Fallen. Wir entwickeln Suchmethoden, um Konflikte in deterministischen und probabilistischen Zustandsräumen zu erkennen, sowie Methoden, um die verschiedenen Unerreichbarkeitsdetektoren basierend auf den erkannten Konflikten zu verfeinern. Instanziiert als Tiefensuche weisen unsere Techniken ähnliche Eigenschaften auf wie das konfliktgesteuerte Lernen für Constraint-Satisfaction-Problemen. Wir evaluieren die entwickelten Methoden empirisch, und zeigen dabei, dass das konfliktgesteuerte Lernen unter gewissen Voraussetzungen zu signifikanten Suchreduktionen beim Finden von Plänen in lösbaren klassischen Planungsproblemen, Beweisen der Unlösbarkeit von klassischen Planungsproblemen, und Lösen von MaxProb im probabilistischen Planen, führen kann

    BNAIC 2008:Proceedings of BNAIC 2008, the twentieth Belgian-Dutch Artificial Intelligence Conference

    Get PDF

    Efficient Automated Planning with New Formulations

    Get PDF
    Problem solving usually strongly relies on how the problem is formulated. This fact also applies to automated planning, a key field in artificial intelligence research. Classical planning used to be dominated by STRIPS formulation, a simple model based on propositional logic. In the recently introduced SAS+ formulation, the multi-valued variables naturally depict certain invariants that are missed in STRIPS, make SAS+ have many favorable features. Because of its rich structural information SAS+ begins to attract lots of research interest. Existing works, however, are mostly limited to one single thing: to improve heuristic functions. This is in sharp contrast with the abundance of planning models and techniques in the field. On the other hand, although heuristic is a key part for search, its effectiveness is limited. Recent investigations have shown that even if we have almost perfect heuristics, the number of states to visit is still exponential. Therefore, there is a barrier between the nice features of SAS+ and its applications in planning algorithms. In this dissertation, we have recasted two major planning paradigms: state space search and planning as Satisfiability: SAT), with three major contributions. First, we have utilized SAS+ for a new hierarchical state space search model by taking advantage of the decomposable structure within SAS+. This algorithm can greatly reduce the time complexity for planning. Second, planning as Satisfiability is a major planning approach, but it is traditionally based on STRIPS. We have developed a new SAS+ based SAT encoding scheme: SASE) for planning. The state space modeled by SASE shows a decomposable structure with certain components independent to others, showing promising structure that STRIPS based encoding does not have. Third, the expressiveness of planning is important for real world scenarios, thus we have also extended the planning as SAT to temporally expressive planning and planning with action costs, two advanced features beyond classical planning. The resulting planner is competitive to state-of-the-art planners, in terms of both quality and performance. Overall, our work strongly suggests a shifting trend of planning from STRIPS to SAS+, and shows the power of formulating planning problems as Satisfiability. Given the important roles of both classical planning and temporal planning, our work will inspire new developments in other advanced planning problem domains

    The 2011 International Planning Competition

    Get PDF
    After a 3 years gap, the 2011 edition of the IPC involved a total of 55 planners, some of them versions of the same planner, distributed among four tracks: the sequential satisficing track (27 planners submitted out of 38 registered), the sequential multicore track (8 planners submitted out of 12 registered), the sequential optimal track (12 planners submitted out of 24 registered) and the temporal satisficing track (8 planners submitted out of 14 registered). Three more tracks were open to participation: temporal optimal, preferences satisficing and preferences optimal. Unfortunately the number of submitted planners did not allow these tracks to be finally included in the competition. A total of 55 people were participating, grouped in 31 teams. Participants came from Australia, Canada, China, France, Germany, India, Israel, Italy, Spain, UK and USA. For the sequential tracks 14 domains, with 20 problems each, were selected, while the temporal one had 12 domains, also with 20 problems each. Both new and past domains were included. As in previous competitions, domains and problems were unknown for participants and all the experimentation was carried out by the organizers. To run the competition a cluster of eleven 64-bits computers (Intel XEON 2.93 Ghz Quad core processor) using Linux was set up. Up to 1800 seconds, 6 GB of RAM memory and 750 GB of hard disk were available for each planner to solve a problem. This resulted in 7540 computing hours (about 315 days), plus a high number of hours devoted to preliminary experimentation with new domains, reruns and bugs fixing. The detailed results of the competition, the software used for automating most tasks, the source code of all the participating planners and the description of domains and problems can be found at the competition’s web page: http://www.plg.inf.uc3m.es/ipc2011-deterministicThis booklet summarizes the participants on the Deterministic Track of the International Planning Competition (IPC) 2011. Papers describing all the participating planners are included

    New perspectives on cost partitioning for optimal classical planning

    Get PDF
    Admissible heuristics are the main ingredient when solving classical planning tasks optimally with heuristic search. There are many such heuristics, and each has its own strengths and weaknesses. As higher admissible heuristic values are more accurate, the maximum over several admissible heuristics dominates each individual one. Operator cost partitioning is a well-known technique to combine admissible heuristics in a way that dominates their maximum and remains admissible. But are there better options to combine the heuristics? We make three main contributions towards this question: Extensions to the cost partitioning framework can produce higher estimates from the same set of heuristics. Cost partitioning traditionally uses non-negative cost functions. We prove that this restriction is not necessary, and that allowing negative values as well makes the framework more powerful: the resulting heuristic values can be exponentially higher, and unsolvability can be detected even if all component heuristics have a finite value. We also generalize operator cost partitioning to transition cost partitioning, which can differentiate between different contexts in which an operator is used. Operator-counting heuristics reason about the number of times each operator is used in a plan. Many existing heuristics can be expressed in this framework, which gives new theoretical insight into their relationship. Different operator-counting heuristics can be easily combined within the framework in a way that dominates their maximum. Potential heuristics compute a heuristic value as a weighted sum over state features and are a fast alternative to operator-counting heuristics. Admissible and consistent potential heuristics for certain feature sets can be described in a compact way which means that the best heuristic from this class can be extracted in polynomial time. Both operator-counting and potential heuristics are closely related to cost partitioning. They offer a new look on cost-partitioned heuristics and already sparked research beyond their use as classical planning heuristics
    corecore