12 research outputs found
Plan2Vec: Unsupervised Representation Learning by Latent Plans
In this paper we introduce plan2vec, an unsupervised representation learning
approach that is inspired by reinforcement learning. Plan2vec constructs a
weighted graph on an image dataset using near-neighbor distances, and then
extrapolates this local metric to a global embedding by distilling
path-integral over planned path. When applied to control, plan2vec offers a way
to learn goal-conditioned value estimates that are accurate over long horizons
that is both compute and sample efficient. We demonstrate the effectiveness of
plan2vec on one simulated and two challenging real-world image datasets.
Experimental results show that plan2vec successfully amortizes the planning
cost, enabling reactive planning that is linear in memory and computation
complexity rather than exhaustive over the entire state space.Comment: code available at https://geyang.github.io/plan2ve
An open-ended learning architecture to face the REAL 2020 simulated robot competition
Open-ended learning is a core research field of machine learning and robotics
aiming to build learning machines and robots able to autonomously acquire
knowledge and skills and to reuse them to solve novel tasks. The multiple
challenges posed by open-ended learning have been operationalized in the
robotic competition REAL 2020. This requires a simulated camera-arm-gripper
robot to (a) autonomously learn to interact with objects during an intrinsic
phase where it can learn how to move objects and then (b) during an extrinsic
phase, to re-use the acquired knowledge to accomplish externally given goals
requiring the robot to move objects to specific locations unknown during the
intrinsic phase. Here we present a 'baseline architecture' for solving the
challenge, provided as baseline model for REAL 2020. Few models have all the
functionalities needed to solve the REAL 2020 benchmark and none has been
tested with it yet. The architecture we propose is formed by three components:
(1) Abstractor: abstracting sensory input to learn relevant control variables
from images; (2) Explorer: generating experience to learn goals and actions;
(3) Planner: formulating and executing action plans to accomplish the
externally provided goals. The architecture represents the first model to solve
the simpler REAL 2020 'Round 1' allowing the use of a simple parameterised push
action. On Round 2, the architecture was used with a more general action
(sequence of joints positions) achieving again higher than chance level
performance. The baseline software is well documented and available for
download and use at https://github.com/AIcrowd/REAL2020_starter_kit.Comment: 21 pages, 8 figure
DMotion: Robotic Visuomotor Control with Unsupervised Forward Model Learned from Videos
Learning an accurate model of the environment is essential for model-based
control tasks. Existing methods in robotic visuomotor control usually learn
from data with heavily labelled actions, object entities or locations, which
can be demanding in many cases. To cope with this limitation, we propose a
method, dubbed DMotion, that trains a forward model from video data only, via
disentangling the motion of controllable agent to model the transition
dynamics. An object extractor and an interaction learner are trained in an
end-to-end manner without supervision. The agent's motions are explicitly
represented using spatial transformation matrices containing physical meanings.
In the experiments, DMotion achieves superior performance on learning an
accurate forward model in a Grid World environment, as well as a more realistic
robot control environment in simulation. With the accurate learned forward
models, we further demonstrate their usage in model predictive control as an
effective approach for robotic manipulations.Comment: IROS 202
Mapping State Space using Landmarks for Universal Goal Reaching
An agent that has well understood the environment should be able to apply its
skills for any given goals, leading to the fundamental problem of learning the
Universal Value Function Approximator (UVFA). A UVFA learns to predict the
cumulative rewards between all state-goal pairs. However, empirically, the
value function for long-range goals is always hard to estimate and may
consequently result in failed policy. This has presented challenges to the
learning process and the capability of neural networks. We propose a method to
address this issue in large MDPs with sparse rewards, in which exploration and
routing across remote states are both extremely challenging. Our method
explicitly models the environment in a hierarchical manner, with a high-level
dynamic landmark-based map abstracting the visited state space, and a low-level
value network to derive precise local decisions. We use farthest point sampling
to select landmark states from past experience, which has improved exploration
compared with simple uniform sampling. Experimentally we showed that our method
enables the agent to reach long-range goals at the early training stage, and
achieve better performance than standard RL algorithms for a number of
challenging tasks
RoboNet: Large-Scale Multi-Robot Learning
Robot learning has emerged as a promising tool for taming the complexity and
diversity of the real world. Methods based on high-capacity models, such as
deep networks, hold the promise of providing effective generalization to a wide
range of open-world environments. However, these same methods typically require
large amounts of diverse training data to generalize effectively. In contrast,
most robotic learning experiments are small-scale, single-domain, and
single-robot. This leads to a frequent tension in robotic learning: how can we
learn generalizable robotic controllers without having to collect impractically
large amounts of data for each separate experiment? In this paper, we propose
RoboNet, an open database for sharing robotic experience, which provides an
initial pool of 15 million video frames, from 7 different robot platforms, and
study how it can be used to learn generalizable models for vision-based robotic
manipulation. We combine the dataset with two different learning algorithms:
visual foresight, which uses forward video prediction models, and supervised
inverse models. Our experiments test the learned algorithms' ability to work
across new objects, new tasks, new scenes, new camera viewpoints, new grippers,
or even entirely new robots. In our final experiment, we find that by
pre-training on RoboNet and fine-tuning on data from a held-out Franka or Kuka
robot, we can exceed the performance of a robot-specific training approach that
uses 4x-20x more data. For videos and data, see the project webpage:
https://www.robonet.wiki/Comment: accepted at the Conference on Robot Learning (CoRL) 201
Hierarchical Foresight: Self-Supervised Learning of Long-Horizon Tasks via Visual Subgoal Generation
Video prediction models combined with planning algorithms have shown promise
in enabling robots to learn to perform many vision-based tasks through only
self-supervision, reaching novel goals in cluttered scenes with unseen objects.
However, due to the compounding uncertainty in long horizon video prediction
and poor scalability of sampling-based planning optimizers, one significant
limitation of these approaches is the ability to plan over long horizons to
reach distant goals. To that end, we propose a framework for subgoal generation
and planning, hierarchical visual foresight (HVF), which generates subgoal
images conditioned on a goal image, and uses them for planning. The subgoal
images are directly optimized to decompose the task into easy to plan segments,
and as a result, we observe that the method naturally identifies semantically
meaningful states as subgoals. Across three out of four simulated vision-based
manipulation tasks, we find that our method achieves nearly a 200% performance
improvement over planning without subgoals and model-free RL approaches.
Further, our experiments illustrate that our approach extends to real,
cluttered visual scenes. Project page:
https://sites.google.com/stanford.edu/hvfComment: 16 pages, 9 figure
Which Mutual-Information Representation Learning Objectives are Sufficient for Control?
Mutual information maximization provides an appealing formalism for learning
representations of data. In the context of reinforcement learning (RL), such
representations can accelerate learning by discarding irrelevant and redundant
information, while retaining the information necessary for control. Much of the
prior work on these methods has addressed the practical difficulties of
estimating mutual information from samples of high-dimensional observations,
while comparatively less is understood about which mutual information
objectives yield representations that are sufficient for RL from a theoretical
perspective. In this paper, we formalize the sufficiency of a state
representation for learning and representing the optimal policy, and study
several popular mutual-information based objectives through this lens.
Surprisingly, we find that two of these objectives can yield insufficient
representations given mild and common assumptions on the structure of the MDP.
We corroborate our theoretical results with empirical experiments on a
simulated game environment with visual observations.Comment: 18 pages, 11 figure
Never Stop Learning: The Effectiveness of Fine-Tuning in Robotic Reinforcement Learning
One of the great promises of robot learning systems is that they will be able
to learn from their mistakes and continuously adapt to ever-changing
environments. Despite this potential, most of the robot learning systems today
are deployed as a fixed policy and they are not being adapted after their
deployment. Can we efficiently adapt previously learned behaviors to new
environments, objects and percepts in the real world? In this paper, we present
a method and empirical evidence towards a robot learning framework that
facilitates continuous adaption. In particular, we demonstrate how to adapt
vision-based robotic manipulation policies to new variations by fine-tuning via
off-policy reinforcement learning, including changes in background, object
shape and appearance, lighting conditions, and robot morphology. Further, this
adaptation uses less than 0.2% of the data necessary to learn the task from
scratch. We find that our approach of adapting pre-trained policies leads to
substantial performance gains over the course of fine-tuning, and that
pre-training via RL is essential: training from scratch or adapting from
supervised ImageNet features are both unsuccessful with such small amounts of
data. We also find that these positive results hold in a limited continual
learning setting, in which we repeatedly fine-tune a single lineage of policies
using data from a succession of new tasks. Our empirical conclusions are
consistently supported by experiments on simulated manipulation tasks, and by
52 unique fine-tuning experiments on a real robotic grasping system pre-trained
on 580,000 grasps.Comment: 8.5 pages, 9 figures. See video overview and experiments at
https://youtu.be/pPDVewcSpdc and project website at
https://ryanjulian.me/continual-fine-tunin
Weakly-Supervised Reinforcement Learning for Controllable Behavior
Reinforcement learning (RL) is a powerful framework for learning to take
actions to solve tasks. However, in many settings, an agent must winnow down
the inconceivably large space of all possible tasks to the single task that it
is currently being asked to solve. Can we instead constrain the space of tasks
to those that are semantically meaningful? In this work, we introduce a
framework for using weak supervision to automatically disentangle this
semantically meaningful subspace of tasks from the enormous space of
nonsensical "chaff" tasks. We show that this learned subspace enables efficient
exploration and provides a representation that captures distance between
states. On a variety of challenging, vision-based continuous control problems,
our approach leads to substantial performance gains, particularly as the
complexity of the environment grows.Comment: Published in NeurIPS 202
The Differentiable Cross-Entropy Method
We study the cross-entropy method (CEM) for the non-convex optimization of a
continuous and parameterized objective function and introduce a differentiable
variant that enables us to differentiate the output of CEM with respect to the
objective function's parameters. In the machine learning setting this brings
CEM inside of the end-to-end learning pipeline where this has otherwise been
impossible. We show applications in a synthetic energy-based structured
prediction task and in non-convex continuous control. In the control setting we
show how to embed optimal action sequences into a lower-dimensional space. DCEM
enables us to fine-tune CEM-based controllers with policy optimization.Comment: ICML 202