20 research outputs found
Towards the Improvement of robot motion learning techniques
Dissertação de Mestrado em Engenharia InformáticaThis manuscript presents solutions and methods to address some of the many problems that arise when dealing with the complex task of motor skill learning in robots.
In the last years, several research lines have focused on learning motion primitives either through imitation learning or reinforcement learning. However, for many applications, learning a motion primitive of a single form is not enough and it is required that after being assimilated, the primitive is generalizable such that it can be executed in different contexts and for distinct instances of the same task. Therefore, the motion primitive must adapt a set of parameters according to the environment variables instead of always executing the exact same motor commands when it is put into action.
Another aspect to have into consideration is how the learning process of motion primitives is guided. Some primitives are too complex to be learned all at once, i.e, learning all their intricacies without a properly structured approach may be intractable.
In this thesis, these aspects are mindfully taken into account, allowing to develop reinforcement learning techniques that are then used to teach a controller of a biped robot that is only able to generate stable locomotion on a flat surface, making it tolerant to a range of slope angles, perpendicular and/or parallel to the direction of walking. Legged locomotion is a relevant example of a complex and dynamic motor skill that has been the focus of intensive research for many years in robotics and it is expected for the techniques that are successful in the learning of such a hard task to be useful in other contexts.
In order to achieve this goal, three main steps, divided into chapters of this thesis, are taken. First, an existing algorithm - Cost-regularized Kernel Regression (CrKR) - originally introduced to allow learning to generalize parameterized policies is modified and extended into a new algorithm named CrKR++. Some of the performed changes allow to use the algorithm for training sessions with a high number of samples, which is needed when it is intended to learn complex policies. This feat would be impracticable with the original version of the algorithm due to its high computational complexity. The remaining changes are issued with the purpose of improving the general effectiveness of the algorithm.
Second, a framework that enables storing, combining and mutual learning of parameterized policies is presented. This framework, where the CrKR++ algorithm plays a core role, provides the means, for instance, to create a movement primitives library or to perform gradual learning of a motor skill, being named Flexible Framework for Learning (F3L).
Finally, the developed framework is used to teach the controller of the biped robot to adapt its locomotion parameters according to the slope angles of the underlying surface. The achieved solution and intermediate steps are tested in simulation software with Dynamic Anthropomorphic Robot with Intelligence–Open Platform (DARwIn-OP) in carefully delineated experiments.Esta tese apresenta soluções e métodos que abordam alguns dos muitos problemas que
surgem quando lidando com o complexo problema da aprendizagem de tarefas motoras em
robôs.
Nos últimos anos, várias linhas de investigação focaram-se na aprendizagem de primitivas
de movimento, quer pela aprendizagem via imitação quer pela aprendizagem via reforço.
Contudo, em muitas aplicações, não basta assimilar uma primitiva numa única forma e pode
ser necessário que depois de assimilada, uma primitiva seja generalizável de maneira a ser
possível executá-la em diferentes contextos e para diferentes instâncias de uma mesma tarefa.
Uma primitiva de movimento deve portanto nestes casos adaptar um conjunto de parâmetros
de acordo com as condições do meio envolvente em vez de executar sempre os mesmos
comandos motores quando colocada em ação. Outro aspeto a ter em consideração é ainda a
forma como o processo de aprendizagem das primitivas de movimento é guiado. Algumas
primitivas são demasiado complexas para serem apreendidas de uma vez só, isto é, aprender
todas as suas nuances sem uma abordagem estruturada pode revelar-se extremamente difícil.
Nesta tese, estes dois aspetos são tidos em conta, o que permite desenvolver novas técnicas
de aprendizagem via reforço que são depois usadas para ensinar um programa controlador
de um robô bípede que é apenas capaz de lidar com superfícies planas, tornando-o tolerante a
uma gama de inclinações em direções perpendiculares ou paralelas à direção do movimento. A
locomoção com pernas é o exemplo definitivo de uma tarefa motora complexa e dinâmica que
tem sido alvo de investigação intensiva durante anos na robótica. É de esperar que as técnicas
que sejam bem sucedidas na aprendizagem de uma tarefa com este grau de dificuldade sejam
também úteis em outros contextos.
Para atingir este objetivo, três passos principais, que se dividem em capítulos desta tese
são dados. Em primeiro lugar, um algoritmo já existente - CrKR - ,originalmente criado para
permitir a aprendizagem de políticas parametrizadas, é modificado e transformado num novo
algoritmo denominado CrKR++. Algumas das modificações feitas permitem usar o algoritmo
em sessões de treino com um maior número de amostras, o que é necessário quando se pretende
aprender políticas com um elevado grau de complexidade. Tal seria impossível com a
versão original do algoritmo devido à sua elevada complexidade computacional. As restantes
modificações são introduzidas com o propósito de melhorar a eficácia geral do algoritmo.
Em segundo lugar, uma framework que permite o armazenamento, a combinação e a aprendizagem
mútua de políticas parametrizadas é apresentada. Esta framework, onde o algoritmo
CrKR++ desempenha uma função nuclear, providencia os meios para, por exemplo, criar uma
biblioteca de primitivas de movimento ou realizar aprendizagem gradual de uma tarefa motora
sendo denominada de F3L.
Por fim, a framework desenvolvida é utilizada para ensinar o controlador do robô bípede a
adaptar determinados parâmetros da locomoção em função da inclinação da superfície subjacente.
A solução alcançada bem como os passos intermédios são testados em software de
simulação com o robô DARwIn-OP em experiências cuidadosamente delineadas
Locomoção bípede adaptativa a partir de uma única demonstração usando primitivas de movimento
Doutoramento em Engenharia EletrotécnicaEste trabalho aborda o problema de capacidade de imitação da locomoção
humana através da utilização de trajetórias de baixo nível codificadas com
primitivas de movimento e utilizá-las para depois generalizar para novas
situações, partindo apenas de uma demonstração única. Assim, nesta linha de
pensamento, os principais objetivos deste trabalho são dois: o primeiro é
analisar, extrair e codificar demonstrações efetuadas por um humano, obtidas
por um sistema de captura de movimento de forma a modelar tarefas de
locomoção bípede. Contudo, esta transferência não está limitada à simples
reprodução desses movimentos, requerendo uma evolução das capacidades
para adaptação a novas situações, assim como lidar com perturbações
inesperadas. Assim, o segundo objetivo é o desenvolvimento e avaliação de
uma estrutura de controlo com capacidade de modelação das ações, de tal
forma que a demonstração única apreendida possa ser modificada para o robô
se adaptar a diversas situações, tendo em conta a sua dinâmica e o ambiente
onde está inserido.
A ideia por detrás desta abordagem é resolver o problema da generalização a
partir de uma demonstração única, combinando para isso duas estruturas
básicas. A primeira consiste num sistema gerador de padrões baseado em
primitivas de movimento utilizando sistemas dinâmicos (DS). Esta abordagem
de codificação de movimentos possui propriedades desejáveis que a torna ideal
para geração de trajetórias, tais como a possibilidade de modificar determinados
parâmetros em tempo real, tais como a amplitude ou a frequência do ciclo do
movimento e robustez a pequenas perturbações. A segunda estrutura, que está
embebida na anterior, é composta por um conjunto de osciladores acoplados
em fase que organizam as ações de unidades funcionais de forma coordenada.
Mudanças em determinadas condições, como o instante de contacto ou
impactos com o solo, levam a modelos com múltiplas fases. Assim, em vez de
forçar o movimento do robô a situações pré-determinadas de forma temporal, o
gerador de padrões de movimento proposto explora a transição entre diferentes
fases que surgem da interação do robô com o ambiente, despoletadas por
eventos sensoriais. A abordagem proposta é testada numa estrutura de
simulação dinâmica, sendo que várias experiências são efetuadas para avaliar
os métodos e o desempenho dos mesmos.This work addresses the problem of learning to imitate human locomotion actions
through low-level trajectories encoded with motion primitives and generalizing
them to new situations from a single demonstration. In this line of thought, the
main objectives of this work are twofold: The first is to analyze, extract and
encode human demonstrations taken from motion capture data in order to model
biped locomotion tasks. However, transferring motion skills from humans to
robots is not limited to the simple reproduction, but requires the evaluation of
their ability to adapt to new situations, as well as to deal with unexpected
disturbances. Therefore, the second objective is to develop and evaluate a
control framework for action shaping such that the single-demonstration can be
modulated to varying situations, taking into account the dynamics of the robot
and its environment.
The idea behind the approach is to address the problem of generalization from
a single-demonstration by combining two basic structures. The first structure is
a pattern generator system consisting of movement primitives learned and
modelled by dynamical systems (DS). This encoding approach possesses
desirable properties that make them well-suited for trajectory generation, namely
the possibility to change parameters online such as the amplitude and the
frequency of the limit cycle and the intrinsic robustness against small
perturbations. The second structure, which is embedded in the previous one,
consists of coupled phase oscillators that organize actions into functional
coordinated units. The changing contact conditions plus the associated impacts
with the ground lead to models with multiple phases. Instead of forcing the robot’s
motion into a predefined fixed timing, the proposed pattern generator explores
transition between phases that emerge from the interaction of the robot system
with the environment, triggered by sensor-driven events. The proposed approach
is tested in a dynamics simulation framework and several experiments are
conducted to validate the methods and to assess the performance of a humanoid
robot
Multi-expert learning of adaptive legged locomotion
Achieving versatile robot locomotion requires motor skills which can adapt to
previously unseen situations. We propose a Multi-Expert Learning Architecture
(MELA) that learns to generate adaptive skills from a group of representative
expert skills. During training, MELA is first initialised by a distinct set of
pre-trained experts, each in a separate deep neural network (DNN). Then by
learning the combination of these DNNs using a Gating Neural Network (GNN),
MELA can acquire more specialised experts and transitional skills across
various locomotion modes. During runtime, MELA constantly blends multiple DNNs
and dynamically synthesises a new DNN to produce adaptive behaviours in
response to changing situations. This approach leverages the advantages of
trained expert skills and the fast online synthesis of adaptive policies to
generate responsive motor skills during the changing tasks. Using a unified
MELA framework, we demonstrated successful multi-skill locomotion on a real
quadruped robot that performed coherent trotting, steering, and fall recovery
autonomously, and showed the merit of multi-expert learning generating
behaviours which can adapt to unseen scenarios
Humanoid Robots
For many years, the human being has been trying, in all ways, to recreate the complex mechanisms that form the human body. Such task is extremely complicated and the results are not totally satisfactory. However, with increasing technological advances based on theoretical and experimental researches, man gets, in a way, to copy or to imitate some systems of the human body. These researches not only intended to create humanoid robots, great part of them constituting autonomous systems, but also, in some way, to offer a higher knowledge of the systems that form the human body, objectifying possible applications in the technology of rehabilitation of human beings, gathering in a whole studies related not only to Robotics, but also to Biomechanics, Biomimmetics, Cybernetics, among other areas. This book presents a series of researches inspired by this ideal, carried through by various researchers worldwide, looking for to analyze and to discuss diverse subjects related to humanoid robots. The presented contributions explore aspects about robotic hands, learning, language, vision and locomotion
A Bio-inspired architecture for adaptive quadruped locomotion over irregular terrain
Tese de doutoramento
Programa Doutoral em Engenharia Electrónica e de ComputadoresThis thesis presents a tentative advancement on walking control of small quadruped and humanoid
position controlled robots, addressing the problem of walk generation by combining dynamical systems
approach to motor control, insights from neuroethology research on vertebrate motor control and
computational neuroscience.
Legged locomotion is a complex dynamical process, despite the seemingly easy and natural behavior
of the constantly present proficiency of legged animals. Research on locomotion and motor control
in vertebrate animals from the last decades has brought to the attention of roboticists, the potential of
the nature’s solutions to robot applications. Recent knowledge on the organization of complex motor
generation and on mechanics and dynamics of locomotion has been successfully exploited to pursue
agile robot locomotion.
The work presented on this manuscript is part of an effort on the pursuit in devising a general,
model free solution, for the generation of robust and adaptable walking behaviors. It strives to devise a
practical solution applicable to real robots, such as the Sony’s quadruped AIBO and Robotis’ DARwIn-
OP humanoid. The discussed solutions are inspired on the functional description of the vertebrate
neural systems, especially on the concept of Central Pattern Generators (CPGs), their structure and
organization, components and sensorimotor interactions. They use a dynamical systems approach for
the implementation of the controller, especially on the use of nonlinear oscillators and exploitation of
their properties.
The main topics of this thesis are divided into three parts.
The first part concerns quadruped locomotion, extending a previous CPG solution using nonlinear
oscillators, and discussing an organization on three hierarchical levels of abstraction, sharing the purpose
and knowledge of other works. It proposes a CPG solution which generates the walking motion
for the whole-leg, which is then organized in a network for the production of quadrupedal gaits. The
devised solution is able to produce goal-oriented locomotion and navigation as directed through highlevel
commands from local planning methods. In this part, active balance on a standing quadruped is
also addressed, proposing a method based on dynamical systems approach, exploring the integration of
parallel postural mechanisms from several sensory modalities. The solutions are all successfully tested on the quadruped AIBO robot.
In the second part, is addressed bipedal walking for humanoid robots. A CPG solution for biped
walking based on the concept of motion primitives is proposed, loosely based on the idea of synergistic
organization of vertebrate motor control. A set of motion primitives is shown to produce the basis
of simple biped walking, and generalizable to goal-oriented walking. Using the proposed CPG, the
inclusion of feedback mechanisms is investigated, for modulation and adaptation of walking, through
phase transition control according to foot load information. The proposed solution is validated on the
humanoid DARwIn-OP, and its application is evaluated within a whole-body control framework.
The third part sidesteps a little from the other two topics. It discusses the CPG as having an alternative
role to direct motor generation in locomotion, serving instead as a processor of sensory information
for a feedback based motor generation. In this work a reflex based walking controller is devised for the
compliant quadruped Oncilla robot, to serve as purely feedback based walking generation. The capabilities
of the reflex network are shown in simulations, followed by a brief discussion on its limitations,
and how they could be improved by the inclusion of a CPG.Esta tese apresenta uma tentativa de avanço no controlo de locomoção para pequenos robôs quadrúpedes
e bipedes controlados por posição, endereçando o problema de geração motora através da combinação
da abordagem de sistemas dinâmicos para o controlo motor, e perspectivas de investigação
neuroetologia no controlo motor vertebrado e neurociência computacional.
Andar é um processo dinâmico e complexo, apesar de parecer um comportamento fácil e natural
devido à presença constante de animais proficientes em locomoção terrestre. Investigação na área da locomoção
e controlo motor em animais vertebrados nas últimas decadas, trouxe à atenção dos roboticistas
o potencial das soluções encontradas pela natureza aplicadas a aplicações robóticas. Conhecimento
recente relativo à geração de comportamentos motores complexos e da mecânica da locomoção tem
sido explorada com sucesso na procura de locomoção ágil na robótica.
O trabalho apresentado neste documento é parte de um esforço no desenho de uma solução geral,
e independente de modelos, para a geração robusta e adaptável de comportamentos locomotores. O
foco é desenhar uma solução prática, aplicável a robôs reais, tal como o quadrúpede Sony AIBO e
o humanóide DARwIn-OP. As soluções discutidas são inspiradas na descrição funcional do sistema
nervoso vertebrado, especialmente no conceito de Central Pattern Generators (CPGs), a sua estrutura e
organização, componentes e interacção sensorimotora. Estas soluções são implementadas usando uma
abordagem em sistemas dinâmicos, focandos o uso de osciladores não lineares e a explorando as suas
propriedades.
Os tópicos principais desta tese estão divididos em três partes.
A primeira parte explora o tema de locomoção quadrúpede, expandindo soluções prévias de CPGs
usando osciladores não lineares, e discutindo uma organização em três níveis de abstracção, partilhando
as ideias de outros trabalhos. Propõe uma solução de CPG que gera os movimentos locomotores
para uma perna, que é depois organizado numa rede, para a produção de marcha quadrúpede. A
solução concebida é capaz de produzir locomoção e navegação, comandada através de comandos de alto
nível, produzidos por métodos de planeamento local. Nesta parte também endereçado o problema da
manutenção do equilíbrio num robô quadrúpede parado, propondo um método baseado na abordagem
em sistemas dinâmicos, explorando a integração de mecanismos posturais em paralelo, provenientes de várias modalidades sensoriais. As soluções são todas testadas com sucesso no robô quadrupede AIBO.
Na segunda parte é endereçado o problema de locomoção bípede. É proposto um CPG baseado
no conceito de motion primitives, baseadas na ideia de uma organização sinergética do controlo motor
vertebrado. Um conjunto de motion primitives é usado para produzir a base de uma locomoção bípede
simples e generalizável para navegação. Esta proposta de CPG é usada para de seguida se investigar
a inclusão de mecanismos de feedback para modulação e adaptação da marcha, através do controlo de
transições entre fases, de acordo com a informação de carga dos pés. A solução proposta é validada
no robô humanóide DARwIn-OP, e a sua aplicação no contexto do framework de whole-body control é
também avaliada.
A terceira parte desvia um pouco dos outros dois tópicos. Discute o CPG como tendo um papel
alternativo ao controlo motor directo, servindo em vez como um processador de informação sensorial
para um mecanismo de locomoção puramente em feedback. Neste trabalho é desenhado um controlador
baseado em reflexos para a geração da marcha de um quadrúpede compliant. As suas capacidades são
demonstradas em simulação, seguidas por uma breve discussão nas suas limitações, e como estas podem
ser ultrapassadas pela inclusão de um CPG.The presented work was possible thanks to the support by the Portuguese Science and Technology Foundation through the PhD grant SFRH/BD/62047/2009
Pre-computation for controlling character behavior in interactive physical simulations
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2010.Cataloged from PDF version of thesis.Includes bibliographical references (p. 129-136).The development of advanced computer animation tools has allowed talented artists to create digital actors, or characters, in films and commercials that move in a plausible and compelling way. In interactive applications, however, the artist does not have total control over the scenarios the character will experience. Unexpected changes in the environment of the character or unexpected interactions with dynamic elements of the virtual world can lead to implausible motions. This work investigates the use of physical simulation to automatically synthesize plausible character motions in interactive applications. We show how to simulate a realistic motion for a humanoid character by creating a feedback controller that tracks a motion capture recording. By applying the right forces at the right time, the controller is able to recover from a range of interesting changes to the environment and unexpected disturbances. Controlling physically simulated humanoid characters is non-trivial as they are governed by non-linear, non-smooth, and high-dimensional equations of motion. We simplify the problem by using a linearized and simplified dynamics model near a reference trajectory. Tracking a reference trajectory is an effective way of getting a character to perform a single task. However, simulated characters need to perform many tasks form a variety of possible configurations. This work also describes a method for combining existing controllers by adding their output forces to perform new tasks. This allows one to reuse existing controllers. A surprising fact is that combined controllers can perform optimally under certain conditions. These methods allow us to interactively simulate many interesting humanoid character behaviors in two and three dimensions. These characters have many more degrees of freedom than typical robot systems and move much more naturally. Simulation is fast enough that the controllers could soon be used to animate characters in interactive games. It is also possible that these simulations could be used to test robotic designs and biomechanical hypotheses.by Marco Jorge Tome da Silva.Ph.D
Metastable legged-robot locomotion
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Mechanical Engineering, 2008.This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Includes bibliographical references (p. 195-215).A variety of impressive approaches to legged locomotion exist; however, the science of legged robotics is still far from demonstrating a solution which performs with a level of flexibility, reliability and careful foot placement that would enable practical locomotion on the variety of rough and intermittent terrain humans negotiate with ease on a regular basis. In this thesis, we strive toward this particular goal by developing a methodology for designing control algorithms for moving a legged robot across such terrain in a qualitatively satisfying manner, without falling down very often. We feel the definition of a meaningful metric for legged locomotion is a useful goal in and of itself. Specifically, the mean first-passage time (MFPT), also called the mean time to failure (MTTF), is an intuitively practical cost function to optimize for a legged robot, and we present the reader with a systematic, mathematical process for obtaining estimates of this MFPT metric. Of particular significance, our models of walking on stochastically rough terrain generally result in dynamics with a fast mixing time, where initial conditions are largely "forgotten" within 1 to 3 steps. Additionally, we can often find a near-optimal solution for motion planning using only a short time-horizon look-ahead. Although we openly recognize that there are important classes of optimization problems for which long-term planning is required to avoid "running into a dead end" (or off of a cliff!), we demonstrate that many classes of rough terrain can in fact be successfully negotiated with a surprisingly high level of long-term reliability by selecting the short-sighted motion with the greatest probability of success. The methods used throughout have direct relevance to machine learning, providing a physics-based approach to reduce state space dimensionality and mathematical tools to obtain a scalar metric quantifying performance of the resulting reduced-order system.by Katie Byl.Ph.D
Scaled Autonomy for Networked Humanoids
Humanoid robots have been developed with the intention of aiding in environments designed for humans. As such, the control of humanoid morphology and effectiveness of human robot interaction form the two principal research issues for deploying these robots in the real world. In this thesis work, the issue of humanoid control is coupled with human robot interaction under the framework of scaled autonomy, where the human and robot exchange levels of control depending on the environment and task at hand. This scaled autonomy is approached with control algorithms for reactive stabilization of human commands and planned trajectories that encode semantically meaningful motion preferences in a sequential convex optimization framework.
The control and planning algorithms have been extensively tested in the field for robustness and system verification. The RoboCup competition provides a benchmark competition for autonomous agents that are trained with a human supervisor. The kid-sized and adult-sized humanoid robots coordinate over a noisy network in a known environment with adversarial opponents, and the software and routines in this work allowed for five consecutive championships. Furthermore, the motion planning and user interfaces developed in the work have been tested in the noisy network of the DARPA Robotics Challenge (DRC) Trials and Finals in an unknown environment.
Overall, the ability to extend simplified locomotion models to aid in semi-autonomous manipulation allows untrained humans to operate complex, high dimensional robots. This represents another step in the path to deploying humanoids in the real world, based on the low dimensional motion abstractions and proven performance in real world tasks like RoboCup and the DRC
Adaptive Locomotion: The Cylindabot Robot
Adaptive locomotion is an emerging field of robotics due to the complex interaction between the robot and its environment. Hybrid locomotion is where a robot has more than one mode of locomotion and potentially delivers the benefits of both, however, these advantages are often not quantified or applied to new scenarios. The classic approach is to design robots with a high number of degrees of freedom and a complex control system, whereas an intelligent morphology can simplify the problem and maintain capabilities. Cylindabot is designed to be a minimally actuated hybrid robot with strong terrain crossing capabilities. By limiting the number of motors, this reduces the robot's weight and means less reinforcement is needed for the physical frame or drive system. Cylindabot uses different drive directions to transform between using wheels or legs. Cylindabot is able to climb a slope of 32 degrees and a step ratio of 1.43 while only being driven by two motors. A physical prototype and simulation models show that adaptation is optimal for a range of terrain (slopes, steps, ridges and gaps). Cylindabot successfully adapts to a map environment where there are several routes to the target location. These results show that a hybrid robot can increase its terrain capabilities when changing how it moves and that this adaptation can be applied to wider environments. This is an important step to have hybrid robots being deployed to real situations