2,654 research outputs found
MoDem-V2: Visuo-Motor World Models for Real-World Robot Manipulation
Robotic systems that aspire to operate in uninstrumented real-world
environments must perceive the world directly via onboard sensing. Vision-based
learning systems aim to eliminate the need for environment instrumentation by
building an implicit understanding of the world based on raw pixels, but
navigating the contact-rich high-dimensional search space from solely sparse
visual reward signals significantly exacerbates the challenge of exploration.
The applicability of such systems is thus typically restricted to simulated or
heavily engineered environments since agent exploration in the real-world
without the guidance of explicit state estimation and dense rewards can lead to
unsafe behavior and safety faults that are catastrophic. In this study, we
isolate the root causes behind these limitations to develop a system, called
MoDem-V2, capable of learning contact-rich manipulation directly in the
uninstrumented real world. Building on the latest algorithmic advancements in
model-based reinforcement learning (MBRL), demo-bootstrapping, and effective
exploration, MoDem-V2 can acquire contact-rich dexterous manipulation skills
directly in the real world. We identify key ingredients for leveraging
demonstrations in model learning while respecting real-world safety
considerations -- exploration centering, agency handover, and actor-critic
ensembles. We empirically demonstrate the contribution of these ingredients in
four complex visuo-motor manipulation problems in both simulation and the real
world. To the best of our knowledge, our work presents the first successful
system for demonstration-augmented visual MBRL trained directly in the real
world. Visit https://sites.google.com/view/modem-v2 for videos and more
details.Comment: 9 pages, 8 figure
One-Shot Learning of Manipulation Skills with Online Dynamics Adaptation and Neural Network Priors
One of the key challenges in applying reinforcement learning to complex
robotic control tasks is the need to gather large amounts of experience in
order to find an effective policy for the task at hand. Model-based
reinforcement learning can achieve good sample efficiency, but requires the
ability to learn a model of the dynamics that is good enough to learn an
effective policy. In this work, we develop a model-based reinforcement learning
algorithm that combines prior knowledge from previous tasks with online
adaptation of the dynamics model. These two ingredients enable highly
sample-efficient learning even in regimes where estimating the true dynamics is
very difficult, since the online model adaptation allows the method to locally
compensate for unmodeled variation in the dynamics. We encode the prior
experience into a neural network dynamics model, adapt it online by
progressively refitting a local linear model of the dynamics, and use model
predictive control to plan under these dynamics. Our experimental results show
that this approach can be used to solve a variety of complex robotic
manipulation tasks in just a single attempt, using prior data from other
manipulation behaviors
Intrinsically Motivated Goal Exploration Processes with Automatic Curriculum Learning
Intrinsically motivated spontaneous exploration is a key enabler of
autonomous lifelong learning in human children. It enables the discovery and
acquisition of large repertoires of skills through self-generation,
self-selection, self-ordering and self-experimentation of learning goals. We
present an algorithmic approach called Intrinsically Motivated Goal Exploration
Processes (IMGEP) to enable similar properties of autonomous or self-supervised
learning in machines. The IMGEP algorithmic architecture relies on several
principles: 1) self-generation of goals, generalized as fitness functions; 2)
selection of goals based on intrinsic rewards; 3) exploration with incremental
goal-parameterized policy search and exploitation of the gathered data with a
batch learning algorithm; 4) systematic reuse of information acquired when
targeting a goal for improving towards other goals. We present a particularly
efficient form of IMGEP, called Modular Population-Based IMGEP, that uses a
population-based policy and an object-centered modularity in goals and
mutations. We provide several implementations of this architecture and
demonstrate their ability to automatically generate a learning curriculum
within several experimental setups including a real humanoid robot that can
explore multiple spaces of goals with several hundred continuous dimensions.
While no particular target goal is provided to the system, this curriculum
allows the discovery of skills that act as stepping stone for learning more
complex skills, e.g. nested tool use. We show that learning diverse spaces of
goals with intrinsic motivations is more efficient for learning complex skills
than only trying to directly learn these complex skills
Using Parameterized Black-Box Priors to Scale Up Model-Based Policy Search for Robotics
The most data-efficient algorithms for reinforcement learning in robotics are
model-based policy search algorithms, which alternate between learning a
dynamical model of the robot and optimizing a policy to maximize the expected
return given the model and its uncertainties. Among the few proposed
approaches, the recently introduced Black-DROPS algorithm exploits a black-box
optimization algorithm to achieve both high data-efficiency and good
computation times when several cores are used; nevertheless, like all
model-based policy search approaches, Black-DROPS does not scale to high
dimensional state/action spaces. In this paper, we introduce a new model
learning procedure in Black-DROPS that leverages parameterized black-box priors
to (1) scale up to high-dimensional systems, and (2) be robust to large
inaccuracies of the prior information. We demonstrate the effectiveness of our
approach with the "pendubot" swing-up task in simulation and with a physical
hexapod robot (48D state space, 18D action space) that has to walk forward as
fast as possible. The results show that our new algorithm is more
data-efficient than previous model-based policy search algorithms (with and
without priors) and that it can allow a physical 6-legged robot to learn new
gaits in only 16 to 30 seconds of interaction time.Comment: Accepted at ICRA 2018; 8 pages, 4 figures, 2 algorithms, 1 table;
Video at https://youtu.be/HFkZkhGGzTo ; Spotlight ICRA presentation at
https://youtu.be/_MZYDhfWeL
- …