13 research outputs found
Contact Energy Based Hindsight Experience Prioritization
Multi-goal robot manipulation tasks with sparse rewards are difficult for
reinforcement learning (RL) algorithms due to the inefficiency in collecting
successful experiences. Recent algorithms such as Hindsight Experience Replay
(HER) expedite learning by taking advantage of failed trajectories and
replacing the desired goal with one of the achieved states so that any failed
trajectory can be utilized as a contribution to learning. However, HER
uniformly chooses failed trajectories, without taking into account which ones
might be the most valuable for learning. In this paper, we address this problem
and propose a novel approach Contact Energy Based Prioritization~(CEBP) to
select the samples from the replay buffer based on rich information due to
contact, leveraging the touch sensors in the gripper of the robot and object
displacement. Our prioritization scheme favors sampling of contact-rich
experiences, which are arguably the ones providing the largest amount of
information. We evaluate our proposed approach on various sparse reward robotic
tasks and compare them with the state-of-the-art methods. We show that our
method surpasses or performs on par with those methods on robot manipulation
tasks. Finally, we deploy the trained policy from our method to a real Franka
robot for a pick-and-place task. We observe that the robot can solve the task
successfully. The videos and code are publicly available at:
https://erdiphd.github.io/HER_forc
Reinforcement Learning of Action and Query Policies with LTL Instructions under Uncertain Event Detector
Reinforcement learning (RL) with linear temporal logic (LTL) objectives can
allow robots to carry out symbolic event plans in unknown environments. Most
existing methods assume that the event detector can accurately map
environmental states to symbolic events; however, uncertainty is inevitable for
real-world event detectors. Such uncertainty in an event detector generates
multiple branching possibilities on LTL instructions, confusing action
decisions. Moreover, the queries to the uncertain event detector, necessary for
the task's progress, may increase the uncertainty further. To cope with those
issues, we propose an RL framework, Learning Action and Query over Belief LTL
(LAQBL), to learn an agent that can consider the diversity of LTL instructions
due to uncertain event detection while avoiding task failure due to the
unnecessary event-detection query. Our framework simultaneously learns 1) an
embedding of belief LTL, which is multiple branching possibilities on LTL
instructions using a graph neural network, 2) an action policy, and 3) a query
policy which decides whether or not to query for the event detector.
Simulations in a 2D grid world and image-input robotic inspection environments
show that our method successfully learns actions to follow LTL instructions
even with uncertain event detectors.Comment: 8 pages, Accepted by Robotics and Automation Letters (RA-L
Evolution Guided Generative Flow Networks
Generative Flow Networks (GFlowNets) are a family of probabilistic generative
models that learn to sample compositional objects proportional to their
rewards. One big challenge of GFlowNets is training them effectively when
dealing with long time horizons and sparse rewards. To address this, we propose
Evolution guided generative flow networks (EGFN), a simple but powerful
augmentation to the GFlowNets training using Evolutionary algorithms (EA). Our
method can work on top of any GFlowNets training objective, by training a set
of agent parameters using EA, storing the resulting trajectories in the
prioritized replay buffer, and training the GFlowNets agent using the stored
trajectories. We present a thorough investigation over a wide range of toy and
real-world benchmark tasks showing the effectiveness of our method in handling
long trajectories and sparse rewards.Comment: 16 pages, 16 figue
Goal-Conditioned Reinforcement Learning with Imagined Subgoals
Goal-conditioned reinforcement learning endows an agent with a large variety
of skills, but it often struggles to solve tasks that require more temporally
extended reasoning. In this work, we propose to incorporate imagined subgoals
into policy learning to facilitate learning of complex tasks. Imagined subgoals
are predicted by a separate high-level policy, which is trained simultaneously
with the policy and its critic. This high-level policy predicts intermediate
states halfway to the goal using the value function as a reachability metric.
We don't require the policy to reach these subgoals explicitly. Instead, we use
them to define a prior policy, and incorporate this prior into a KL-constrained
policy iteration scheme to speed up and regularize learning. Imagined subgoals
are used during policy learning, but not during test time, where we only apply
the learned policy. We evaluate our approach on complex robotic navigation and
manipulation tasks and show that it outperforms existing methods by a large
margin.Comment: ICML 2021. See the project webpage at
https://www.di.ens.fr/willow/research/ris
Reward-weighted DHER Mechanism For Multi-goal Reinforcement Learning With Application To Robotic Manipulation Control
In multi-goal reinforcement learning, an agent learns to achieve multiple goals using a goal-oriented policy, obtaining rewards from positions that have been achieved. Dynamic hindsight experience replay method improves the learning efficiency of the algorithm by matching the trajectories of past failed episodes and creating successful experiences. But these experiences are sampled and replayed by a random strategy, without considering the importance of the episode samples for learning. Therefore, not only bias is introduced as the training process, but also suboptimal improvements in terms of sample efficiency are obtained. To address these issues, this paper introduces a reward-weighted mechanism based on the dynamic hindsight experience replay (RDHER). We extend dynamic hindsight experience replay with a trade-off to make rewards calculated for hindsight experience numerically greater than actual rewards. Specifically, the hindsight rewards are multiplied by a weighting factor to increase the Q-value of the hindsight state–action pair, which drives the update of the policy to select the maximum action for the given hindsight transitions. Our experiments show that the hindsight bias can be reduced in training using the proposed method. Further, we demonstrate RDHER is effective in challenging robot manipulation tasks, and outperforms several other multi-goal baseline methods in terms of success rate