1,892 research outputs found
Jump-Start Reinforcement Learning
Reinforcement learning (RL) provides a theoretical framework for continuously
improving an agent's behavior via trial and error. However, efficiently
learning policies from scratch can be very difficult, particularly for tasks
with exploration challenges. In such settings, it might be desirable to
initialize RL with an existing policy, offline data, or demonstrations.
However, naively performing such initialization in RL often works poorly,
especially for value-based methods. In this paper, we present a meta algorithm
that can use offline data, demonstrations, or a pre-existing policy to
initialize an RL policy, and is compatible with any RL approach. In particular,
we propose Jump-Start Reinforcement Learning (JSRL), an algorithm that employs
two policies to solve tasks: a guide-policy, and an exploration-policy. By
using the guide-policy to form a curriculum of starting states for the
exploration-policy, we are able to efficiently improve performance on a set of
simulated robotic tasks. We show via experiments that JSRL is able to
significantly outperform existing imitation and reinforcement learning
algorithms, particularly in the small-data regime. In addition, we provide an
upper bound on the sample complexity of JSRL and show that with the help of a
guide-policy, one can improve the sample complexity for non-optimism
exploration methods from exponential in horizon to polynomial.Comment: 20 pages, 10 figure
Densely Supervised Grasp Detector (DSGD)
This paper presents Densely Supervised Grasp Detector (DSGD), a deep learning
framework which combines CNN structures with layer-wise feature fusion and
produces grasps and their confidence scores at different levels of the image
hierarchy (i.e., global-, region-, and pixel-levels). % Specifically, at the
global-level, DSGD uses the entire image information to predict a grasp. At the
region-level, DSGD uses a region proposal network to identify salient regions
in the image and predicts a grasp for each salient region. At the pixel-level,
DSGD uses a fully convolutional network and predicts a grasp and its confidence
at every pixel. % During inference, DSGD selects the most confident grasp as
the output. This selection from hierarchically generated grasp candidates
overcomes limitations of the individual models. % DSGD outperforms
state-of-the-art methods on the Cornell grasp dataset in terms of grasp
accuracy. % Evaluation on a multi-object dataset and real-world robotic
grasping experiments show that DSGD produces highly stable grasps on a set of
unseen objects in new environments. It achieves 97% grasp detection accuracy
and 90% robotic grasping success rate with real-time inference speed
- …