82 research outputs found

    Guided Cooperation in Hierarchical Reinforcement Learning via Model-based Rollout

    Full text link
    Goal-conditioned hierarchical reinforcement learning (HRL) presents a promising approach for enabling effective exploration in complex long-horizon reinforcement learning (RL) tasks via temporal abstraction. Yet, most goal-conditioned HRL algorithms focused on the subgoal discovery, regardless of inter-level coupling. In essence, for hierarchical systems, the increased inter-level communication and coordination can induce more stable and robust policy improvement. Here, we present a goal-conditioned HRL framework with Guided Cooperation via Model-based Rollout (GCMR), which estimates forward dynamics to promote inter-level cooperation. The GCMR alleviates the state-transition error within off-policy correction through a model-based rollout, further improving the sample efficiency. Meanwhile, to avoid being disrupted by these corrected but possibly unseen or faraway goals, lower-level Q-function gradients are constrained using a gradient penalty with a model-inferred upper bound, leading to a more stable behavioral policy. Besides, we propose a one-step rollout-based planning to further facilitate inter-level cooperation, where the higher-level Q-function is used to guide the lower-level policy by estimating the value of future states so that global task information is transmitted downwards to avoid local pitfalls. Experimental results demonstrate that incorporating the proposed GCMR framework with ACLG, a disentangled variant of HIGL, yields more stable and robust policy improvement than baselines and substantially outperforms previous state-of-the-art (SOTA) HRL algorithms in both hard-exploration problems and robotic control

    Balancing Exploration and Exploitation in Hierarchical Reinforcement Learning via Latent Landmark Graphs

    Full text link
    Goal-Conditioned Hierarchical Reinforcement Learning (GCHRL) is a promising paradigm to address the exploration-exploitation dilemma in reinforcement learning. It decomposes the source task into subgoal conditional subtasks and conducts exploration and exploitation in the subgoal space. The effectiveness of GCHRL heavily relies on subgoal representation functions and subgoal selection strategy. However, existing works often overlook the temporal coherence in GCHRL when learning latent subgoal representations and lack an efficient subgoal selection strategy that balances exploration and exploitation. This paper proposes HIerarchical reinforcement learning via dynamically building Latent Landmark graphs (HILL) to overcome these limitations. HILL learns latent subgoal representations that satisfy temporal coherence using a contrastive representation learning objective. Based on these representations, HILL dynamically builds latent landmark graphs and employs a novelty measure on nodes and a utility measure on edges. Finally, HILL develops a subgoal selection strategy that balances exploration and exploitation by jointly considering both measures. Experimental results demonstrate that HILL outperforms state-of-the-art baselines on continuous control tasks with sparse rewards in sample efficiency and asymptotic performance. Our code is available at https://github.com/papercode2022/HILL.Comment: Accepted by the conference of International Joint Conference on Neural Networks (IJCNN) 202

    Goal Space Abstraction in Hierarchical Reinforcement Learning via Reachability Analysis

    Full text link
    Open-ended learning benefits immensely from the use of symbolic methods for goal representation as they offer ways to structure knowledge for efficient and transferable learning. However, the existing Hierarchical Reinforcement Learning (HRL) approaches relying on symbolic reasoning are often limited as they require a manual goal representation. The challenge in autonomously discovering a symbolic goal representation is that it must preserve critical information, such as the environment dynamics. In this work, we propose a developmental mechanism for subgoal discovery via an emergent representation that abstracts (i.e., groups together) sets of environment states that have similar roles in the task. We create a HRL algorithm that gradually learns this representation along with the policies and evaluate it on navigation tasks to show the learned representation is interpretable and results in data efficiency

    Generalizing to New Tasks via One-Shot Compositional Subgoals

    Full text link
    The ability to generalize to previously unseen tasks with little to no supervision is a key challenge in modern machine learning research. It is also a cornerstone of a future "General AI". Any artificially intelligent agent deployed in a real world application, must adapt on the fly to unknown environments. Researchers often rely on reinforcement and imitation learning to provide online adaptation to new tasks, through trial and error learning. However, this can be challenging for complex tasks which require many timesteps or large numbers of subtasks to complete. These "long horizon" tasks suffer from sample inefficiency and can require extremely long training times before the agent can learn to perform the necessary longterm planning. In this work, we introduce CASE which attempts to address these issues by training an Imitation Learning agent using adaptive "near future" subgoals. These subgoals are recalculated at each step using compositional arithmetic in a learned latent representation space. In addition to improving learning efficiency for standard long-term tasks, this approach also makes it possible to perform one-shot generalization to previously unseen tasks, given only a single reference trajectory for the task in a different environment. Our experiments show that the proposed approach consistently outperforms the previous state-of-the-art compositional Imitation Learning approach by 30%.Comment: Present at ICRA 2022 "Compositional Robotics: Mathematics and Tools

    Reverse curriculum hierarchical recursive learning

    Get PDF
    This thesis presents a study on Hierarchical Reinforcement Learning, where several different approaches for learning are researched, developed and tested. Specifically, the algorithm Reverse Curriculum Vicinity Learning (RCVL) resulted with an excellent performance in the tested environments. It is built on two level hierarchies, where the high hierarchy learns to suggest adequate subgoals to the lower level recursively, the latter learns the sequence of needed primitive actions to achieve the subgoals, and finally the ultimate goal. Currently it is designed only for discrete Reinforcement Learning environments. RCVL algorithm has shown outperformance over the State of the Art algorithms: DDQN and DDQN combined with HER in more complex tested environment. It has reached to a success rate of above 97%, while avoiding unfeasible subgoals suggestion, and constructing optimal paths. Finally, it has shown to be robust to most of hyperparameters changes. Hierarchical Learning allows to break a task into several smaller sub-tasks, which results with faster learning, since smaller tasks are easier to master. Each of the levels in the hierarchy has its own "resolution" (i.e. different time scales) of the problem, while the low policy is the one to interact with the environment exclusively. In our problem the subgoals proposed by the high policy can be seen as milestones that break the big task into several shorter tasks. The proposed algorithm also integrates the concept of Reverse Curriculum Learning. Its learning begins from states around the goal, and gradually expands to more difficult tasks from further states, until mastering the whole state space. With this curricular approach, the agent is able to learn faster: first it masters the easy tasks, and then challenged with harder tasks. In the proposed algorithm the high hierarchy stores neighbours from the vicinity of each goal (collected by low hierarchy interactions) such that the goal is reachable from them with a limited number of actions. In the meantime, the low policy learns simple actions to solve the mini-trajectories from the neighbours to the goal. Then with the accumulation of knowledge of both hierarchies, the high policy learns to draw a path from the goal to the state backwards recursively, suggesting the subgoals along the way. By long-term return estimation learning, the agent is able to decide which is the best subgoal for each given pair of state and goal (or subgoal). More concepts are integrated in the algorithm to accelerate the learning and to allow sample efficiency. First, It is an off-policy algorithm. Secondly, the reward system is designed to exploit the maximum information when rolling-out the collected experience so that all possible ordered combinations are stored with a non-sparse reward. the algorithm uses Hindsight Experience Relabelling, allowing exploitation of the accumulated experience in a more efficient way

    Goal-Space Planning with Subgoal Models

    Full text link
    This paper investigates a new approach to model-based reinforcement learning using background planning: mixing (approximate) dynamic programming updates and model-free updates, similar to the Dyna architecture. Background planning with learned models is often worse than model-free alternatives, such as Double DQN, even though the former uses significantly more memory and computation. The fundamental problem is that learned models can be inaccurate and often generate invalid states, especially when iterated many steps. In this paper, we avoid this limitation by constraining background planning to a set of (abstract) subgoals and learning only local, subgoal-conditioned models. This goal-space planning (GSP) approach is more computationally efficient, naturally incorporates temporal abstraction for faster long-horizon planning and avoids learning the transition dynamics entirely. We show that our GSP algorithm can learn significantly faster than a Double DQN baseline in a variety of situations
    • …
    corecore