5 research outputs found
Bootstrapping of parameterized skills through hybrid optimization in task and policy spaces
Queißer J, Steil JJ. Bootstrapping of parameterized skills through hybrid optimization in task and policy spaces. Frontiers in Robotics and AI. 2018;5:49.Modern robotic applications create high demands on adaptation of actions with respect to
variance in a given task. Reinforcement learning is able to optimize for these changing conditions,
but relearning from scratch is hardly feasible due to the high number of required rollouts. We
propose a parameterized skill that generalizes to new actions for changing task parameters,
which is encoded as a meta-learner that provides parameters for task-specific dynamic motion
primitives. Our work shows that utilizing parameterized skills for initialization of the optimization
process leads to a more effective incremental task learning. In addition, we introduce a hybrid
optimization method that combines a fast coarse optimization on a manifold of policy parameters
with a fine grained parameter search in the unrestricted space of actions. The proposed algorithm
reduces the number of required rollouts for adaptation to new task conditions. Application in
illustrative toy scenarios, for a 10-DOF planar arm, and a humanoid robot point reaching task
validate the approach
Synthesizing Goal-Directed Actions from a Library of Example Movements
We present a new learning framework for synthesizing goal-directed actions from example movements. The approach is based on the memorization of training data and locally weighted regression to compute suitable movements for a large range of situations. The proposed method avoids making specific assumptions about an adequate representation of the task. Instead, we use a general representation based on fifth order splines. The data used for learning comes either from the observation of events in the Cartesian space or from the actual movement execution on the robot. Thus it informs us about the appropriate motion in the example situations. We show that by applying locally weighted regression to such data, we can generate actions having proper dynamics to solve the given task. To test the validity of the approach, we present simulation results under various conditions as well as experiments on a real robot
Multi-modal Skill Memories for Online Learning of Interactive Robot Movement Generation
Queißer J. Multi-modal Skill Memories for Online Learning of Interactive Robot Movement Generation. Bielefeld: Universität Bielefeld; 2018.Modern robotic applications pose complex requirements with respect to the adaptation of
actions regarding the variability in a given task. Reinforcement learning can optimize for
changing conditions, but relearning from scratch is hardly feasible due to the high number of
required rollouts. This work proposes a parameterized skill that generalizes to new actions
for changing task parameters. The actions are encoded by a meta-learner that provides
parameters for task-specific dynamic motion primitives. Experimental evaluation shows that
the utilization of parameterized skills for initialization of the optimization process leads to a
more effective incremental task learning. A proposed hybrid optimization method combines
a fast coarse optimization on a manifold of policy parameters with a fine-grained parameter
search in the unrestricted space of actions. It is shown that the developed algorithm reduces
the number of required rollouts for adaptation to new task conditions. Further, this work
presents a transfer learning approach for adaptation of learned skills to new situations.
Application in illustrative toy scenarios, for a 10-DOF planar arm, a humanoid robot point
reaching task and parameterized drumming on a pneumatic robot validate the approach.
But parameterized skills that are applied on complex robotic systems pose further
challenges: the dynamics of the robot and the interaction with the environment introduce
model inaccuracies. In particular, high-level skill acquisition on highly compliant robotic
systems such as pneumatically driven or soft actuators is hardly feasible. Since learning of
the complete dynamics model is not feasible due to the high complexity, this thesis examines
two alternative approaches: First, an improvement of the low-level control based on an
equilibrium model of the robot. Utilization of an equilibrium model reduces the learning
complexity and this thesis evaluates its applicability for control of pneumatic and industrial
light-weight robots. Second, an extension of parameterized skills to generalize for forward
signals of action primitives that result in an enhanced control quality of complex robotic
systems. This thesis argues for a shift in the complexity of learning the full dynamics of the
robot to a lower dimensional task-related learning problem. Due to the generalization in
relation to the task variability, online learning for complex robots as well as complex scenarios
becomes feasible. An experimental evaluation investigates the generalization capabilities of
the proposed online learning system for robot motion generation. Evaluation is performed
through simulation of a compliant 2-DOF arm and scalability to a complex robotic system
is demonstrated for a pneumatically driven humanoid robot with 8-DOF
Locomoção bípede adaptativa a partir de uma única demonstração usando primitivas de movimento
Doutoramento em Engenharia EletrotécnicaEste trabalho aborda o problema de capacidade de imitação da locomoção
humana através da utilização de trajetórias de baixo nível codificadas com
primitivas de movimento e utilizá-las para depois generalizar para novas
situações, partindo apenas de uma demonstração única. Assim, nesta linha de
pensamento, os principais objetivos deste trabalho são dois: o primeiro é
analisar, extrair e codificar demonstrações efetuadas por um humano, obtidas
por um sistema de captura de movimento de forma a modelar tarefas de
locomoção bípede. Contudo, esta transferência não está limitada à simples
reprodução desses movimentos, requerendo uma evolução das capacidades
para adaptação a novas situações, assim como lidar com perturbações
inesperadas. Assim, o segundo objetivo é o desenvolvimento e avaliação de
uma estrutura de controlo com capacidade de modelação das ações, de tal
forma que a demonstração única apreendida possa ser modificada para o robô
se adaptar a diversas situações, tendo em conta a sua dinâmica e o ambiente
onde está inserido.
A ideia por detrás desta abordagem é resolver o problema da generalização a
partir de uma demonstração única, combinando para isso duas estruturas
básicas. A primeira consiste num sistema gerador de padrões baseado em
primitivas de movimento utilizando sistemas dinâmicos (DS). Esta abordagem
de codificação de movimentos possui propriedades desejáveis que a torna ideal
para geração de trajetórias, tais como a possibilidade de modificar determinados
parâmetros em tempo real, tais como a amplitude ou a frequência do ciclo do
movimento e robustez a pequenas perturbações. A segunda estrutura, que está
embebida na anterior, é composta por um conjunto de osciladores acoplados
em fase que organizam as ações de unidades funcionais de forma coordenada.
Mudanças em determinadas condições, como o instante de contacto ou
impactos com o solo, levam a modelos com múltiplas fases. Assim, em vez de
forçar o movimento do robô a situações pré-determinadas de forma temporal, o
gerador de padrões de movimento proposto explora a transição entre diferentes
fases que surgem da interação do robô com o ambiente, despoletadas por
eventos sensoriais. A abordagem proposta é testada numa estrutura de
simulação dinâmica, sendo que várias experiências são efetuadas para avaliar
os métodos e o desempenho dos mesmos.This work addresses the problem of learning to imitate human locomotion actions
through low-level trajectories encoded with motion primitives and generalizing
them to new situations from a single demonstration. In this line of thought, the
main objectives of this work are twofold: The first is to analyze, extract and
encode human demonstrations taken from motion capture data in order to model
biped locomotion tasks. However, transferring motion skills from humans to
robots is not limited to the simple reproduction, but requires the evaluation of
their ability to adapt to new situations, as well as to deal with unexpected
disturbances. Therefore, the second objective is to develop and evaluate a
control framework for action shaping such that the single-demonstration can be
modulated to varying situations, taking into account the dynamics of the robot
and its environment.
The idea behind the approach is to address the problem of generalization from
a single-demonstration by combining two basic structures. The first structure is
a pattern generator system consisting of movement primitives learned and
modelled by dynamical systems (DS). This encoding approach possesses
desirable properties that make them well-suited for trajectory generation, namely
the possibility to change parameters online such as the amplitude and the
frequency of the limit cycle and the intrinsic robustness against small
perturbations. The second structure, which is embedded in the previous one,
consists of coupled phase oscillators that organize actions into functional
coordinated units. The changing contact conditions plus the associated impacts
with the ground lead to models with multiple phases. Instead of forcing the robot’s
motion into a predefined fixed timing, the proposed pattern generator explores
transition between phases that emerge from the interaction of the robot system
with the environment, triggered by sensor-driven events. The proposed approach
is tested in a dynamics simulation framework and several experiments are
conducted to validate the methods and to assess the performance of a humanoid
robot