422 research outputs found
Learning to Segment and Represent Motion Primitives from Driving Data for Motion Planning Applications
Developing an intelligent vehicle which can perform human-like actions
requires the ability to learn basic driving skills from a large amount of
naturalistic driving data. The algorithms will become efficient if we could
decompose the complex driving tasks into motion primitives which represent the
elementary compositions of driving skills. Therefore, the purpose of this paper
is to segment unlabeled trajectory data into a library of motion primitives. By
applying a probabilistic inference based on an iterative
Expectation-Maximization algorithm, our method segments the collected
trajectories while learning a set of motion primitives represented by the
dynamic movement primitives. The proposed method utilizes the mutual
dependencies between the segmentation and representation of motion primitives
and the driving-specific based initial segmentation. By utilizing this mutual
dependency and the initial condition, this paper presents how we can enhance
the performance of both the segmentation and the motion primitive library
establishment. We also evaluate the applicability of the primitive
representation method to imitation learning and motion planning algorithms. The
model is trained and validated by using the driving data collected from the
Beijing Institute of Technology intelligent vehicle platform. The results show
that the proposed approach can find the proper segmentation and establish the
motion primitive library simultaneously
Movement primitives with multiple phase parameters
Movement primitives are concise movement representations that can be learned from human demonstrations, support generalization to novel situations and modulate the speed of execution of movements. The speed modulation mechanisms proposed so far are limited though, allowing only for uniform speed modulation or coupling changes in speed to local measurements of forces, torques or other quantities. Those approaches are not enough when dealing with general velocity constraints. We present a movement primitive formulation that can be used to non-uniformly adapt the speed of execution of a movement in order to satisfy a given constraint, while maintaining similarity in shape to the original trajectory. We present results using a 4-DoF robot arm in a minigolf setup
Layered direct policy search for learning hierarchical skills
Solutions to real world robotic tasks often require
complex behaviors in high dimensional continuous state and
action spaces. Reinforcement Learning (RL) is aimed at learning
such behaviors but often fails for lack of scalability. To
address this issue, Hierarchical RL (HRL) algorithms leverage
hierarchical policies to exploit the structure of a task. However,
many HRL algorithms rely on task specific knowledge such
as a set of predefined sub-policies or sub-goals. In this paper
we propose a new HRL algorithm based on information
theoretic principles to autonomously uncover a diverse set
of sub-policies and their activation policies. Moreover, the
learning process mirrors the policys structure and is thus also
hierarchical, consisting of a set of independent optimization
problems. The hierarchical structure of the learning process
allows us to control the learning rate of the sub-policies and
the gating individually and add specific information theoretic
constraints to each layer to ensure the diversification of the subpolicies.
We evaluate our algorithm on two high dimensional
continuous tasks and experimentally demonstrate its ability to
autonomously discover a rich set of sub-policies
Hierarchical relative entropy policy search
Many reinforcement learning (RL) tasks, especially in robotics, consist of multiple sub-tasks that
are strongly structured. Such task structures can be exploited by incorporating hierarchical policies
that consist of gating networks and sub-policies. However, this concept has only been partially explored
for real world settings and complete methods, derived from first principles, are needed. Real
world settings are challenging due to large and continuous state-action spaces that are prohibitive
for exhaustive sampling methods. We define the problem of learning sub-policies in continuous
state action spaces as finding a hierarchical policy that is composed of a high-level gating policy to
select the low-level sub-policies for execution by the agent. In order to efficiently share experience
with all sub-policies, also called inter-policy learning, we treat these sub-policies as latent variables
which allows for distribution of the update information between the sub-policies. We present three
different variants of our algorithm, designed to be suitable for a wide variety of real world robot
learning tasks and evaluate our algorithms in two real robot learning scenarios as well as several
simulations and comparisons
BioÂ-inspired approaches to the control and modelling of an anthropomimetic robot
Introducing robots into human environments requires them to handle settings designed specifically for human size and morphology, however, large, conventional humanoid robots with stiff, high powered joint actuators pose a significant danger to humans. By contrast, âanthropomimeticâ robots mimic both human morphology and internal structure; skeleton, muscles, compliance and high redundancy. Although far safer, their resultant compliant structure presents a formidable challenge to conventional control. Here we review, and seek to address, characteristic control issues of this class of robot, whilst exploiting their biomimetic nature by drawing upon biological motor control research. We derive a novel learning controller for discovering effective reaching actions created through sustained activation of one or more muscle synergies, an approach which draws upon strong, recent evidence from animal and humans studies, but is almost unexplored to date in musculoskeletal robot literature. Since the best synergies for a given robot will be unknown, we derive a deliberately simple reinforcement learning approach intended to allow their emergence, in particular those patterns which aid linearization of control. We also draw upon optimal control theories to encourage the emergence of smoother movement by incorporating signal dependent noise and trial repetition.
In addition, we argue the utility of developing a detailed dynamic model of a complete robot and present a stable, physics-Âââbased model, of the anthropomimetic ECCERobot,
running in real time with 55 muscles and 88 degrees of freedom.
Using the model, we find that effective reaching actions can be learned which employ only two sequential motor co-Âââactivation patterns, each controlled by just a single common driving signal. Factor analysis shows the emergent muscle co-Âââactivations can be reconstructed to significant accuracy using weighted combinations of only 13 common fragments, labelled âcandidate synergiesâ. Using these synergies as drivable units the same controller learns the same task both faster and better, however, other reaching tasks perform less well, proportional to dissimilarity; we therefore propose that modifications enabling emergence of a more generic set of synergies are required.
Finally, we propose a continuous controller for the robot, based on model predictive control, incorporating our model as a predictive component for state estimation, delay-Âââ
compensation and planning, including merging of the robot and sensed environment into a single model. We test the delay compensation mechanism by controlling a second copy of the model acting as a proxy for the real robot, finding that performance is significantly improved if a precise degree of compensation is applied and show how rapidly an un-Âââcompensated controller fails as the model accuracy degrades
A Survey on Policy Search for Robotics
Policy search is a subfield in reinforcement learning which focuses on
finding good parameters for a given policy parametrization. It is well
suited for robotics as it can cope with high-dimensional state and action
spaces, one of the main challenges in robot learning. We review recent
successes of both model-free and model-based policy search in robot
learning.
Model-free policy search is a general approach to learn policies
based on sampled trajectories. We classify model-free methods based on
their policy evaluation strategy, policy update strategy, and exploration
strategy and present a unified view on existing algorithms. Learning a
policy is often easier than learning an accurate forward model, and,
hence, model-free methods are more frequently used in practice. However,
for each sampled trajectory, it is necessary to interact with the
* Both authors contributed equally.
robot, which can be time consuming and challenging in practice. Modelbased
policy search addresses this problem by first learning a simulator
of the robotâs dynamics from data. Subsequently, the simulator generates
trajectories that are used for policy learning. For both modelfree
and model-based policy search methods, we review their respective
properties and their applicability to robotic systems
Definition and composition of motor Primitives using latent force models and hidden markov models
The movement representation problem is at the core of areas such as robot imitation learning
and motion synthesis. In these fields, approaches oriented to the definition of motor primitives as basic building blocks of more complex movements have been extensively used because they cope with the high dimensionality and complexity by using a limited set of adjustable primitives. There is also biological evidence supporting the existence of such primitives in vertebrate and invertebrate motor systems. Traditional methods for representing motor primitives have been purely data-driven or strongly mechanistic. In the former approach new movements are generated using existing movements and these methods are usually very flexible but their extrapolation capacity is limited by the available training data. On the other hand, strongly mechanistic models have a better generalization ability by relying on a physical description of the modeled system, however, it may be hard to fully describe a real system and the resulting differential equations are usually expensive to solve numerically. Therefore, the motor primitive parameterization used in this work is based on a hybrid model which jointly incorporates the flexibility of the data-driven paradigm and the extrapolation capacity of strongly mechanistic models, namely the latent force model framework. Moreover, the sequential composition of different motor primitives is also addressed using Hidden Markov Models (HMMs) which allows to process movement realizations efficiently. The resulting joint model is an HMM with latent force models (LFMs) as emission process which is an unexplored combined probabilistic model to the best of our knowledge
- âŠ