179 research outputs found
Automatic Goal Discovery in Subgoal Monte Carlo Tree Search
Monte Carlo Tree Search (MCTS) is a heuristic search algorithm that can play a wide range of games without requiring any domain-specific knowledge. However, MCTS tends to struggle in very complicated games due to an exponentially increasing branching factor. A promising solution for this problem is to focus the search only on a small fraction of states. Subgoal Monte Carlo Tree Search (S-MCTS) achieves this by using a predefined subgoal-predicate that detects promising states called subgoals. However, not only does this make S-MCTS domaindependent, but also it is often difficult to define a good predicate. In this paper, we propose using quality diversity (QD) algorithms to detect subgoals in real-time. Furthermore, we show how integrating QD-algorithms into S-MCTS significantly improves its performance in the Physical Travelling Salesmen Problem without requiring any domain-specific knowledge
Automatic Goal Discovery in Subgoal Monte Carlo Tree Search
Monte Carlo Tree Search (MCTS) is a heuristic search algorithm that can play a wide range of games without requiring any domain-specific knowledge. However, MCTS tends to struggle in very complicated games due to an exponentially increasing branching factor. A promising solution for this problem is to focus the search only on a small fraction of states. Subgoal Monte Carlo Tree Search (S-MCTS) achieves this by using a predefined subgoal-predicate that detects promising states called subgoals. However, not only does this make S-MCTS domain-dependent, but also it is often difficult to define a good predicate. In this paper, we propose using quality diversity (QD) algorithms to detect subgoals in real-time. Furthermore, we show how integrating QD-algorithms into S-MCTS significantly improves its performance in the Physical Travelling Salesmen Problem without requiring any domain-specific knowledge
Symbol acquisition for task-level planning
We consider the problem of how to plan efficiently in low-level, continuous state spaces with temporally abstract actions (or skills), by constructing abstract representations of the problem suitable for task-level planning.The central question this effort poses is which abstract representations are required to express and evaluate plans composed of sequences of skills. We show that classifiers can be used as a symbolic representation system, and that the ability to represent the preconditions and effects of an agent's skills is both necessary and sufficient for task-level planning.The resulting representations allow a reinforcement learning agent to acquire a symbolic representation appropriate for planning from experience
Model Learning for Look-ahead Exploration in Continuous Control
We propose an exploration method that incorporates look-ahead search over
basic learnt skills and their dynamics, and use it for reinforcement learning
(RL) of manipulation policies . Our skills are multi-goal policies learned in
isolation in simpler environments using existing multigoal RL formulations,
analogous to options or macroactions. Coarse skill dynamics, i.e., the state
transition caused by a (complete) skill execution, are learnt and are unrolled
forward during lookahead search. Policy search benefits from temporal
abstraction during exploration, though itself operates over low-level primitive
actions, and thus the resulting policies does not suffer from suboptimality and
inflexibility caused by coarse skill chaining. We show that the proposed
exploration strategy results in effective learning of complex manipulation
policies faster than current state-of-the-art RL methods, and converges to
better policies than methods that use options or parametrized skills as
building blocks of the policy itself, as opposed to guiding exploration. We
show that the proposed exploration strategy results in effective learning of
complex manipulation policies faster than current state-of-the-art RL methods,
and converges to better policies than methods that use options or parameterized
skills as building blocks of the policy itself, as opposed to guiding
exploration.Comment: This is a pre-print of our paper which is accepted in AAAI 201
Goal-Conditioned Reinforcement Learning with Imagined Subgoals
Goal-conditioned reinforcement learning endows an agent with a large variety
of skills, but it often struggles to solve tasks that require more temporally
extended reasoning. In this work, we propose to incorporate imagined subgoals
into policy learning to facilitate learning of complex tasks. Imagined subgoals
are predicted by a separate high-level policy, which is trained simultaneously
with the policy and its critic. This high-level policy predicts intermediate
states halfway to the goal using the value function as a reachability metric.
We don't require the policy to reach these subgoals explicitly. Instead, we use
them to define a prior policy, and incorporate this prior into a KL-constrained
policy iteration scheme to speed up and regularize learning. Imagined subgoals
are used during policy learning, but not during test time, where we only apply
the learned policy. We evaluate our approach on complex robotic navigation and
manipulation tasks and show that it outperforms existing methods by a large
margin.Comment: ICML 2021. See the project webpage at
https://www.di.ens.fr/willow/research/ris
Hierarchical Imitation Learning with Vector Quantized Models
The ability to plan actions on multiple levels of abstraction enables
intelligent agents to solve complex tasks effectively. However, learning the
models for both low and high-level planning from demonstrations has proven
challenging, especially with higher-dimensional inputs. To address this issue,
we propose to use reinforcement learning to identify subgoals in expert
trajectories by associating the magnitude of the rewards with the
predictability of low-level actions given the state and the chosen subgoal. We
build a vector-quantized generative model for the identified subgoals to
perform subgoal-level planning. In experiments, the algorithm excels at solving
complex, long-horizon decision-making problems outperforming state-of-the-art.
Because of its ability to plan, our algorithm can find better trajectories than
the ones in the training setComment: To appear at ICML 202
Hybrid Search for Efficient Planning with Completeness Guarantees
Solving complex planning problems has been a long-standing challenge in
computer science. Learning-based subgoal search methods have shown promise in
tackling these problems, but they often suffer from a lack of completeness
guarantees, meaning that they may fail to find a solution even if one exists.
In this paper, we propose an efficient approach to augment a subgoal search
method to achieve completeness in discrete action spaces. Specifically, we
augment the high-level search with low-level actions to execute a multi-level
(hybrid) search, which we call complete subgoal search. This solution achieves
the best of both worlds: the practical efficiency of high-level search and the
completeness of low-level search. We apply the proposed search method to a
recently proposed subgoal search algorithm and evaluate the algorithm trained
on offline data on complex planning problems. We demonstrate that our complete
subgoal search not only guarantees completeness but can even improve
performance in terms of search expansions for instances that the high-level
could solve without low-level augmentations. Our approach makes it possible to
apply subgoal-level planning for systems where completeness is a critical
requirement.Comment: NeurIPS 2023 Poste
- …