6,687 research outputs found
Intrinsically Motivated Goal Exploration Processes with Automatic Curriculum Learning
Intrinsically motivated spontaneous exploration is a key enabler of
autonomous lifelong learning in human children. It enables the discovery and
acquisition of large repertoires of skills through self-generation,
self-selection, self-ordering and self-experimentation of learning goals. We
present an algorithmic approach called Intrinsically Motivated Goal Exploration
Processes (IMGEP) to enable similar properties of autonomous or self-supervised
learning in machines. The IMGEP algorithmic architecture relies on several
principles: 1) self-generation of goals, generalized as fitness functions; 2)
selection of goals based on intrinsic rewards; 3) exploration with incremental
goal-parameterized policy search and exploitation of the gathered data with a
batch learning algorithm; 4) systematic reuse of information acquired when
targeting a goal for improving towards other goals. We present a particularly
efficient form of IMGEP, called Modular Population-Based IMGEP, that uses a
population-based policy and an object-centered modularity in goals and
mutations. We provide several implementations of this architecture and
demonstrate their ability to automatically generate a learning curriculum
within several experimental setups including a real humanoid robot that can
explore multiple spaces of goals with several hundred continuous dimensions.
While no particular target goal is provided to the system, this curriculum
allows the discovery of skills that act as stepping stone for learning more
complex skills, e.g. nested tool use. We show that learning diverse spaces of
goals with intrinsic motivations is more efficient for learning complex skills
than only trying to directly learn these complex skills
Sample Efficient Optimization for Learning Controllers for Bipedal Locomotion
Learning policies for bipedal locomotion can be difficult, as experiments are
expensive and simulation does not usually transfer well to hardware. To counter
this, we need al- gorithms that are sample efficient and inherently safe.
Bayesian Optimization is a powerful sample-efficient tool for optimizing
non-convex black-box functions. However, its performance can degrade in higher
dimensions. We develop a distance metric for bipedal locomotion that enhances
the sample-efficiency of Bayesian Optimization and use it to train a 16
dimensional neuromuscular model for planar walking. This distance metric
reflects some basic gait features of healthy walking and helps us quickly
eliminate a majority of unstable controllers. With our approach we can learn
policies for walking in less than 100 trials for a range of challenging
settings. In simulation, we show results on two different costs and on various
terrains including rough ground and ramps, sloping upwards and downwards. We
also perturb our models with unknown inertial disturbances analogous with
differences between simulation and hardware. These results are promising, as
they indicate that this method can potentially be used to learn control
policies on hardware.Comment: To appear in International Conference on Humanoid Robots (Humanoids
'2016), IEEE-RAS. (Rika Antonova and Akshara Rai contributed equally
Active Learning based on Data Uncertainty and Model Sensitivity
Robots can rapidly acquire new skills from demonstrations. However, during
generalisation of skills or transitioning across fundamentally different
skills, it is unclear whether the robot has the necessary knowledge to perform
the task. Failing to detect missing information often leads to abrupt movements
or to collisions with the environment. Active learning can quantify the
uncertainty of performing the task and, in general, locate regions of missing
information. We introduce a novel algorithm for active learning and demonstrate
its utility for generating smooth trajectories. Our approach is based on deep
generative models and metric learning in latent spaces. It relies on the
Jacobian of the likelihood to detect non-smooth transitions in the latent
space, i.e., transitions that lead to abrupt changes in the movement of the
robot. When non-smooth transitions are detected, our algorithm asks for an
additional demonstration from that specific region. The newly acquired
knowledge modifies the data manifold and allows for learning a latent
representation for generating smooth movements. We demonstrate the efficacy of
our approach on generalising elementary skills, transitioning across different
skills, and implicitly avoiding collisions with the environment. For our
experiments, we use a simulated pendulum where we observe its motion from
images and a 7-DoF anthropomorphic arm.Comment: Published on 2018 IEEE/RSJ International Conference on Intelligent
Robots and Syste
- …