6 research outputs found

    Hierarchical Imitation Learning with Vector Quantized Models

    Full text link
    The ability to plan actions on multiple levels of abstraction enables intelligent agents to solve complex tasks effectively. However, learning the models for both low and high-level planning from demonstrations has proven challenging, especially with higher-dimensional inputs. To address this issue, we propose to use reinforcement learning to identify subgoals in expert trajectories by associating the magnitude of the rewards with the predictability of low-level actions given the state and the chosen subgoal. We build a vector-quantized generative model for the identified subgoals to perform subgoal-level planning. In experiments, the algorithm excels at solving complex, long-horizon decision-making problems outperforming state-of-the-art. Because of its ability to plan, our algorithm can find better trajectories than the ones in the training setComment: To appear at ICML 202

    Policy-Guided Heuristic Search with Guarantees

    Full text link
    The use of a policy and a heuristic function for guiding search can be quite effective in adversarial problems, as demonstrated by AlphaGo and its successors, which are based on the PUCT search algorithm. While PUCT can also be used to solve single-agent deterministic problems, it lacks guarantees on its search effort and it can be computationally inefficient in practice. Combining the A* algorithm with a learned heuristic function tends to work better in these domains, but A* and its variants do not use a policy. Moreover, the purpose of using A* is to find solutions of minimum cost, while we seek instead to minimize the search loss (e.g., the number of search steps). LevinTS is guided by a policy and provides guarantees on the number of search steps that relate to the quality of the policy, but it does not make use of a heuristic function. In this work we introduce Policy-guided Heuristic Search (PHS), a novel search algorithm that uses both a heuristic function and a policy and has theoretical guarantees on the search loss that relates to both the quality of the heuristic and of the policy. We show empirically on the sliding-tile puzzle, Sokoban, and a puzzle from the commercial game `The Witness' that PHS enables the rapid learning of both a policy and a heuristic function and compares favorably with A*, Weighted A*, Greedy Best-First Search, LevinTS, and PUCT in terms of number of problems solved and search time in all three domains tested

    Heuristics for the refinement of assumptions in generalized reactivity formulae

    Get PDF
    Reactive synthesis is concerned with automatically generating implementations from formal specifications. These specifications are typically written in the language of generalized reactivity (GR(1)), a subset of linear temporal logic capable of expressing the most common industrial specification patterns, and describe the requirements about the behavior of a system under assumptions about the environment where the system is to be deployed. Oftentimes no implementation exists which guarantees the required behavior under all possible environments, typically due to missing assumptions (this is usually referred to as unrealizability). To address this issue, new assumptions need to be added to complete the specification, a problem known as assumptions refinement. Since the space of candidate assumptions is intractably large, searching for the best solutions is inherently hard. In particular, new methods are needed to (i) increase the effectiveness of the search procedures, measured as the ratio between the number of solutions found and of refinements explored; and (ii) improve the results' quality, defined as the weakness of the solutions. In this thesis we propose a set of heuristics to meet these goals, and a methodology to assess and compare assumptions refinement methods based on quantitative metrics. The heuristics are in the form of algorithms to generate candidate refinements during the search, and quantitative measures to assess the quality of the candidates. We first discuss a heuristic method to generate assumptions that target the cause of unrealizability. This is done by selecting candidate refinement formulas based on Craig's interpolation. We provide a formal underpinning of the technique and evaluate it in terms of our new metric of effectiveness, as defined above, whose value is improved with respect to the state of the art. We demonstrate this on a set of popular benchmarks of embedded software. We then provide a formal, quantitative characterization of the permissiveness of environment assumptions in the form of a weakness measure. We prove that the partial order induced by this measure is consistent with the one induced by implication. The key advantage of this measure is that it allows for prioritizing candidate solutions, as we show experimentally. Lastly, we propose a notion of minimal refinements with respect to the observed counterstrategies. We demonstrate that exploring minimal refinements produces weaker solutions, and reduces the amount of computations needed to explore each refinement. However, this may come at the cost of reducing the effectiveness of the search. To counteract this effect, we propose a hybrid search approach in which both minimal and non-minimal refinements are explored.Open Acces
    corecore