12,308 research outputs found
Learning Parameterized Skills
We introduce a method for constructing skills capable of solving tasks drawn
from a distribution of parameterized reinforcement learning problems. The
method draws example tasks from a distribution of interest and uses the
corresponding learned policies to estimate the topology of the
lower-dimensional piecewise-smooth manifold on which the skill policies lie.
This manifold models how policy parameters change as task parameters vary. The
method identifies the number of charts that compose the manifold and then
applies non-linear regression in each chart to construct a parameterized skill
by predicting policy parameters from task parameters. We evaluate our method on
an underactuated simulated robotic arm tasked with learning to accurately throw
darts at a parameterized target location.Comment: Appears in Proceedings of the 29th International Conference on
Machine Learning (ICML 2012
Recommended from our members
Learning Parameterized Skills
One of the defining characteristics of human intelligence is the ability to acquire and refine skills. Skills are behaviors for solving problems that an agent encounters often—sometimes in different contexts and situations—throughout its lifetime. Identifying important problems that recur and retaining their solutions as skills allows agents to more rapidly solve novel problems by adjusting and combining their existing skills.
In this thesis we introduce a general framework for learning reusable parameterized skills. Reusable skills are parameterized procedures that—given a description of a problem to be solved—produce appropriate behaviors or policies. They can be sequentially and hierarchically combined with other skills to produce progressively more abstract and temporally extended behaviors.
We identify three major challenges involved in the construction of such skills. First, an agent should be capable of solving a small number of problems and generalizing these experiences to construct a single reusable skill. The skill should be capable of producing appropriate behaviors even when applied to yet unseen variations of a problem. We introduce a method for estimating properties of the lower-dimensional manifold on which problem solutions lie. This allows for the construction of unified models for predicting policies from task parameters.
Secondly, the agent should be able to identify when a skill can be hierarchically decomposed into specialized sub-skills. We observe that the policy manifold may be composed of disjoint, piecewise-smooth charts, each one encoding solutions for a subclass of problems. Identifying and modeling sub-skills allows for the aggregation of related behaviors into single, more abstract skills.
Finally, the agent should be able to actively select on which problems to practice in order to more rapidly become competent in a skill. Thoughtful and deliberate practice is one of the defining characteristics of human expert performance. By carefully choosing on which problems to practice the agent might more rapidly construct a skill that performs well over a wide range of problems.
We address these challenges via a general framework for skill acquisition. We evaluate it on simulated decision-problems and on a physical humanoid robot, and demonstrate that it allows for the efficient and active construction of reusable skills
Incremental learning of skills in a task-parameterized Gaussian Mixture Model
The final publication is available at link.springer.comProgramming by demonstration techniques facilitate the programming of robots. Some of them allow the generalization of tasks through parameters, although they require new training when trajectories different from the ones used to estimate the model need to be added. One of the ways to re-train a robot is by incremental learning, which supplies additional information of the task and does not require teaching the whole task again. The present study proposes three techniques to add trajectories to a previously estimated task-parameterized Gaussian mixture model. The first technique estimates a new model by accumulating the new trajectory and the set of trajectories generated using the previous model. The second technique permits adding to the parameters of the existent model those obtained for the new trajectories. The third one updates the model parameters by running a modified version of the Expectation-Maximization algorithm, with the information of the new trajectories. The techniques were evaluated in a simulated task and a real one, and they showed better performance than that of the existent model.Peer ReviewedPostprint (author's final draft
Intrinsically Motivated Goal Exploration Processes with Automatic Curriculum Learning
Intrinsically motivated spontaneous exploration is a key enabler of
autonomous lifelong learning in human children. It enables the discovery and
acquisition of large repertoires of skills through self-generation,
self-selection, self-ordering and self-experimentation of learning goals. We
present an algorithmic approach called Intrinsically Motivated Goal Exploration
Processes (IMGEP) to enable similar properties of autonomous or self-supervised
learning in machines. The IMGEP algorithmic architecture relies on several
principles: 1) self-generation of goals, generalized as fitness functions; 2)
selection of goals based on intrinsic rewards; 3) exploration with incremental
goal-parameterized policy search and exploitation of the gathered data with a
batch learning algorithm; 4) systematic reuse of information acquired when
targeting a goal for improving towards other goals. We present a particularly
efficient form of IMGEP, called Modular Population-Based IMGEP, that uses a
population-based policy and an object-centered modularity in goals and
mutations. We provide several implementations of this architecture and
demonstrate their ability to automatically generate a learning curriculum
within several experimental setups including a real humanoid robot that can
explore multiple spaces of goals with several hundred continuous dimensions.
While no particular target goal is provided to the system, this curriculum
allows the discovery of skills that act as stepping stone for learning more
complex skills, e.g. nested tool use. We show that learning diverse spaces of
goals with intrinsic motivations is more efficient for learning complex skills
than only trying to directly learn these complex skills
Model Learning for Look-ahead Exploration in Continuous Control
We propose an exploration method that incorporates look-ahead search over
basic learnt skills and their dynamics, and use it for reinforcement learning
(RL) of manipulation policies . Our skills are multi-goal policies learned in
isolation in simpler environments using existing multigoal RL formulations,
analogous to options or macroactions. Coarse skill dynamics, i.e., the state
transition caused by a (complete) skill execution, are learnt and are unrolled
forward during lookahead search. Policy search benefits from temporal
abstraction during exploration, though itself operates over low-level primitive
actions, and thus the resulting policies does not suffer from suboptimality and
inflexibility caused by coarse skill chaining. We show that the proposed
exploration strategy results in effective learning of complex manipulation
policies faster than current state-of-the-art RL methods, and converges to
better policies than methods that use options or parametrized skills as
building blocks of the policy itself, as opposed to guiding exploration. We
show that the proposed exploration strategy results in effective learning of
complex manipulation policies faster than current state-of-the-art RL methods,
and converges to better policies than methods that use options or parameterized
skills as building blocks of the policy itself, as opposed to guiding
exploration.Comment: This is a pre-print of our paper which is accepted in AAAI 201
Learning Task Priorities from Demonstrations
Bimanual operations in humanoids offer the possibility to carry out more than
one manipulation task at the same time, which in turn introduces the problem of
task prioritization. We address this problem from a learning from demonstration
perspective, by extending the Task-Parameterized Gaussian Mixture Model
(TP-GMM) to Jacobian and null space structures. The proposed approach is tested
on bimanual skills but can be applied in any scenario where the prioritization
between potentially conflicting tasks needs to be learned. We evaluate the
proposed framework in: two different tasks with humanoids requiring the
learning of priorities and a loco-manipulation scenario, showing that the
approach can be exploited to learn the prioritization of multiple tasks in
parallel.Comment: Accepted for publication at the IEEE Transactions on Robotic
Incremental Bootstrapping of Parameterized Motor Skills
QueiĂźer J, Reinhart F, Steil JJ. Incremental Bootstrapping of Parameterized Motor Skills. In: Proc. IEEE Humanoids. IEEE; 2016.Many motor skills have an intrinsic, low-dimensional parameterization,
e.g. reaching through a grid to different targets. Repeated policy search
for new parameterizations of such a skill is inefficient, because the structure
of the skill variability is not exploited.
This issue has been previously addressed by learning mappings from task
parameters to policy parameters. In this work, we introduce a bootstrapping
technique that establishes such parameterized skills incrementally.
The approach combines iterative learning with state-of-the-art
black-box policy optimization. We investigate the benefits of
incrementally learning parameterized skills for efficient policy
retrieval and show that the number of required rollouts can be
significantly reduced when optimizing policies for novel tasks.
The approach is demonstrated for several parameterized motor
tasks including upper-body reaching motion generation for the
humanoid robot COMAN
- …