179 research outputs found

    Automatic Goal Discovery in Subgoal Monte Carlo Tree Search

    Get PDF
    Monte Carlo Tree Search (MCTS) is a heuristic search algorithm that can play a wide range of games without requiring any domain-specific knowledge. However, MCTS tends to struggle in very complicated games due to an exponentially increasing branching factor. A promising solution for this problem is to focus the search only on a small fraction of states. Subgoal Monte Carlo Tree Search (S-MCTS) achieves this by using a predefined subgoal-predicate that detects promising states called subgoals. However, not only does this make S-MCTS domaindependent, but also it is often difficult to define a good predicate. In this paper, we propose using quality diversity (QD) algorithms to detect subgoals in real-time. Furthermore, we show how integrating QD-algorithms into S-MCTS significantly improves its performance in the Physical Travelling Salesmen Problem without requiring any domain-specific knowledge

    Automatic Goal Discovery in Subgoal Monte Carlo Tree Search

    Get PDF
    Monte Carlo Tree Search (MCTS) is a heuristic search algorithm that can play a wide range of games without requiring any domain-specific knowledge. However, MCTS tends to struggle in very complicated games due to an exponentially increasing branching factor. A promising solution for this problem is to focus the search only on a small fraction of states. Subgoal Monte Carlo Tree Search (S-MCTS) achieves this by using a predefined subgoal-predicate that detects promising states called subgoals. However, not only does this make S-MCTS domain-dependent, but also it is often difficult to define a good predicate. In this paper, we propose using quality diversity (QD) algorithms to detect subgoals in real-time. Furthermore, we show how integrating QD-algorithms into S-MCTS significantly improves its performance in the Physical Travelling Salesmen Problem without requiring any domain-specific knowledge

    Symbol acquisition for task-level planning

    Get PDF
    We consider the problem of how to plan efficiently in low-level, continuous state spaces with temporally abstract actions (or skills), by constructing abstract representations of the problem suitable for task-level planning.The central question this effort poses is which abstract representations are required to express and evaluate plans composed of sequences of skills. We show that classifiers can be used as a symbolic representation system, and that the ability to represent the preconditions and effects of an agent's skills is both necessary and sufficient for task-level planning.The resulting representations allow a reinforcement learning agent to acquire a symbolic representation appropriate for planning from experience

    Model Learning for Look-ahead Exploration in Continuous Control

    Full text link
    We propose an exploration method that incorporates look-ahead search over basic learnt skills and their dynamics, and use it for reinforcement learning (RL) of manipulation policies . Our skills are multi-goal policies learned in isolation in simpler environments using existing multigoal RL formulations, analogous to options or macroactions. Coarse skill dynamics, i.e., the state transition caused by a (complete) skill execution, are learnt and are unrolled forward during lookahead search. Policy search benefits from temporal abstraction during exploration, though itself operates over low-level primitive actions, and thus the resulting policies does not suffer from suboptimality and inflexibility caused by coarse skill chaining. We show that the proposed exploration strategy results in effective learning of complex manipulation policies faster than current state-of-the-art RL methods, and converges to better policies than methods that use options or parametrized skills as building blocks of the policy itself, as opposed to guiding exploration. We show that the proposed exploration strategy results in effective learning of complex manipulation policies faster than current state-of-the-art RL methods, and converges to better policies than methods that use options or parameterized skills as building blocks of the policy itself, as opposed to guiding exploration.Comment: This is a pre-print of our paper which is accepted in AAAI 201

    Goal-Conditioned Reinforcement Learning with Imagined Subgoals

    Full text link
    Goal-conditioned reinforcement learning endows an agent with a large variety of skills, but it often struggles to solve tasks that require more temporally extended reasoning. In this work, we propose to incorporate imagined subgoals into policy learning to facilitate learning of complex tasks. Imagined subgoals are predicted by a separate high-level policy, which is trained simultaneously with the policy and its critic. This high-level policy predicts intermediate states halfway to the goal using the value function as a reachability metric. We don't require the policy to reach these subgoals explicitly. Instead, we use them to define a prior policy, and incorporate this prior into a KL-constrained policy iteration scheme to speed up and regularize learning. Imagined subgoals are used during policy learning, but not during test time, where we only apply the learned policy. We evaluate our approach on complex robotic navigation and manipulation tasks and show that it outperforms existing methods by a large margin.Comment: ICML 2021. See the project webpage at https://www.di.ens.fr/willow/research/ris

    Hierarchical Imitation Learning with Vector Quantized Models

    Full text link
    The ability to plan actions on multiple levels of abstraction enables intelligent agents to solve complex tasks effectively. However, learning the models for both low and high-level planning from demonstrations has proven challenging, especially with higher-dimensional inputs. To address this issue, we propose to use reinforcement learning to identify subgoals in expert trajectories by associating the magnitude of the rewards with the predictability of low-level actions given the state and the chosen subgoal. We build a vector-quantized generative model for the identified subgoals to perform subgoal-level planning. In experiments, the algorithm excels at solving complex, long-horizon decision-making problems outperforming state-of-the-art. Because of its ability to plan, our algorithm can find better trajectories than the ones in the training setComment: To appear at ICML 202

    Hybrid Search for Efficient Planning with Completeness Guarantees

    Full text link
    Solving complex planning problems has been a long-standing challenge in computer science. Learning-based subgoal search methods have shown promise in tackling these problems, but they often suffer from a lack of completeness guarantees, meaning that they may fail to find a solution even if one exists. In this paper, we propose an efficient approach to augment a subgoal search method to achieve completeness in discrete action spaces. Specifically, we augment the high-level search with low-level actions to execute a multi-level (hybrid) search, which we call complete subgoal search. This solution achieves the best of both worlds: the practical efficiency of high-level search and the completeness of low-level search. We apply the proposed search method to a recently proposed subgoal search algorithm and evaluate the algorithm trained on offline data on complex planning problems. We demonstrate that our complete subgoal search not only guarantees completeness but can even improve performance in terms of search expansions for instances that the high-level could solve without low-level augmentations. Our approach makes it possible to apply subgoal-level planning for systems where completeness is a critical requirement.Comment: NeurIPS 2023 Poste