65 research outputs found

    Reinforcement Learning Control for Biped Robot Walking on Uneven Surfaces

    Full text link

    Locomoção de humanoides robusta e versátil baseada em controlo analítico e física residual

    Get PDF
    Humanoid robots are made to resemble humans but their locomotion abilities are far from ours in terms of agility and versatility. When humans walk on complex terrains or face external disturbances, they combine a set of strategies, unconsciously and efficiently, to regain stability. This thesis tackles the problem of developing a robust omnidirectional walking framework, which is able to generate versatile and agile locomotion on complex terrains. We designed and developed model-based and model-free walk engines and formulated the controllers using different approaches including classical and optimal control schemes and validated their performance through simulations and experiments. These frameworks have hierarchical structures that are composed of several layers. These layers are composed of several modules that are connected together to fade the complexity and increase the flexibility of the proposed frameworks. Additionally, they can be easily and quickly deployed on different platforms. Besides, we believe that using machine learning on top of analytical approaches is a key to open doors for humanoid robots to step out of laboratories. We proposed a tight coupling between analytical control and deep reinforcement learning. We augmented our analytical controller with reinforcement learning modules to learn how to regulate the walk engine parameters (planners and controllers) adaptively and generate residuals to adjust the robot’s target joint positions (residual physics). The effectiveness of the proposed frameworks was demonstrated and evaluated across a set of challenging simulation scenarios. The robot was able to generalize what it learned in one scenario, by displaying human-like locomotion skills in unforeseen circumstances, even in the presence of noise and external pushes.Os robôs humanoides são feitos para se parecerem com humanos, mas suas habilidades de locomoção estão longe das nossas em termos de agilidade e versatilidade. Quando os humanos caminham em terrenos complexos ou enfrentam distúrbios externos combinam diferentes estratégias, de forma inconsciente e eficiente, para recuperar a estabilidade. Esta tese aborda o problema de desenvolver um sistema robusto para andar de forma omnidirecional, capaz de gerar uma locomoção para robôs humanoides versátil e ágil em terrenos complexos. Projetámos e desenvolvemos motores de locomoção sem modelos e baseados em modelos. Formulámos os controladores usando diferentes abordagens, incluindo esquemas de controlo clássicos e ideais, e validámos o seu desempenho por meio de simulações e experiências reais. Estes frameworks têm estruturas hierárquicas compostas por várias camadas. Essas camadas são compostas por vários módulos que são conectados entre si para diminuir a complexidade e aumentar a flexibilidade dos frameworks propostos. Adicionalmente, o sistema pode ser implementado em diferentes plataformas de forma fácil. Acreditamos que o uso de aprendizagem automática sobre abordagens analíticas é a chave para abrir as portas para robôs humanoides saírem dos laboratórios. Propusemos um forte acoplamento entre controlo analítico e aprendizagem profunda por reforço. Expandimos o nosso controlador analítico com módulos de aprendizagem por reforço para aprender como regular os parâmetros do motor de caminhada (planeadores e controladores) de forma adaptativa e gerar resíduos para ajustar as posições das juntas alvo do robô (física residual). A eficácia das estruturas propostas foi demonstrada e avaliada em um conjunto de cenários de simulação desafiadores. O robô foi capaz de generalizar o que aprendeu em um cenário, exibindo habilidades de locomoção humanas em circunstâncias imprevistas, mesmo na presença de ruído e impulsos externos.Programa Doutoral em Informátic

    Using Reinforcement Learning in the tuning of Central Pattern Generators

    Get PDF
    Dissertação de mestrado em Engenharia InformáticaÉ objetivo deste trabalho aplicar técnicas de Reinforcement Learning em tarefas de aprendizagem e locomoção de robôs. Reinforcement Learning é uma técnica de aprendizagem útil no que diz respeito à locomoção de robôs, devido à ênfase que dá à interação direta entre o agente e o meio ambiente, e ao facto de não exigir supervisão ou modelos completos, ao contrário do que acontece nas abordagens clássicas. O objetivo desta técnica consiste na decisão das ações a tomar, de forma a maximizar uma recompensa cumulativa, tendo em conta o facto de que as decisões podem afetar não só as recompensas imediatas, como também as futuras. Neste trabalho será apresentada a estrutura e funcionamento do Reinforcement Learning e a sua aplicação em Central Pattern Generators, com o objetivo de gerar locomoção adaptativa otimizada. De forma a investigar e identificar os pontos fortes e capacidades do Reinforcement Learning, e para demonstrar de uma forma simples este tipo de algoritmos, foram implementados dois casos de estudo baseados no estado da arte. No que diz respeito ao objetivo principal desta tese, duas soluções diferentes foram abordadas: uma primeira baseada em métodos Natural-Actor Critic, e a segunda, em Cross-Entropy Method. Este último algoritmo provou ser capaz de lidar com a integração das duas abordagens propostas. As soluções de integração foram testadas e validadas com recurso ao simulador Webots e ao modelo do robô DARwIN-OP.In this work, it is intended to apply Reinforcement Learning techniques in tasks involving learning and robot locomotion. Reinforcement Learning is a very useful learning technique with regard to legged robot locomotion, due to its ability to provide direct interaction between the agent and the environment, and the fact of not requiring supervision or complete models, in contrast with other classic approaches. Its aim consists in making decisions about which actions to take so as to maximize a cumulative reward or reinforcement signal, taking into account the fact that the decisions may affect not only the immediate reward, but also the future ones. In this work it will be studied and presented the Reinforcement Learning framework and its application in the tuning of Central Pattern Generators, with the aim of generating optimized robot locomotion. In order to investigate the strengths and abilities of Reinforcement Learning, and to demonstrate in a simple way the learning process of such algorithms, two case studies were implemented based on the state-of-the-art. With regard to the main purpose of the thesis, two different solutions are addressed: a first one based on Natural-Actor Critic methods, and a second, based on the Cross-Entropy Method. This last algorithm was found to be very capable of handling with the integration of the two proposed approaches. The integration solutions were tested and validated resorting to Webots simulation and DARwIN-OP robot model

    Learning to Exploit Elastic Actuators for Quadruped Locomotion

    Full text link
    Spring-based actuators in legged locomotion provide energy-efficiency and improved performance, but increase the difficulty of controller design. While previous work has focused on extensive modeling and simulation to find optimal controllers for such systems, we propose to learn model-free controllers directly on the real robot. In our approach, gaits are first synthesized by central pattern generators (CPGs), whose parameters are optimized to quickly obtain an open-loop controller that achieves efficient locomotion. Then, to make this controller more robust and further improve the performance, we use reinforcement learning to close the loop, to learn corrective actions on top of the CPGs. We evaluate the proposed approach on the DLR elastic quadruped bert. Our results in learning trotting and pronking gaits show that exploitation of the spring actuator dynamics emerges naturally from optimizing for dynamic motions, yielding high-performing locomotion despite being model-free. The whole process takes no more than 1.5 hours on the real robot and results in natural-looking gaits

    Adaptive, fast walking in a biped robot under neuronal control and learning

    Get PDF
    Human walking is a dynamic, partly self-stabilizing process relying on the interaction of the biomechanical design with its neuronal control. The coordination of this process is a very difficult problem, and it has been suggested that it involves a hierarchy of levels, where the lower ones, e.g., interactions between muscles and the spinal cord, are largely autonomous, and where higher level control (e.g., cortical) arises only pointwise, as needed. This requires an architecture of several nested, sensori–motor loops where the walking process provides feedback signals to the walker's sensory systems, which can be used to coordinate its movements. To complicate the situation, at a maximal walking speed of more than four leg-lengths per second, the cycle period available to coordinate all these loops is rather short. In this study we present a planar biped robot, which uses the design principle of nested loops to combine the self-stabilizing properties of its biomechanical design with several levels of neuronal control. Specifically, we show how to adapt control by including online learning mechanisms based on simulated synaptic plasticity. This robot can walk with a high speed (> 3.0 leg length/s), self-adapting to minor disturbances, and reacting in a robust way to abruptly induced gait changes. At the same time, it can learn walking on different terrains, requiring only few learning experiences. This study shows that the tight coupling of physical with neuronal control, guided by sensory feedback from the walking pattern itself, combined with synaptic learning may be a way forward to better understand and solve coordination problems in other complex motor tasks

    Locomotion through morphology, evolution and learning for legged and limbless robots

    Get PDF
    Mención Internacional en el título de doctorRobot locomotion is concerned with providing autonomous locomotion capabilities to mobile robots. Most current day robots feature some form of locomotion for navigating in their environment. Modalities of robot locomotion includes: (i) aerial locomotion, (ii) terrestrial locomotion, and (iii) aquatic locomotion (on or under water). Three main forms of terrestrial locomotion are, legged locomotion, limbless locomotion and wheel-based locomotion. A Modular Robot (MR), on the other hand, is a robotic system composed of several independent unit modules, where, each module is a robot by itself. The objective in this thesis is to develop legged locomotion in a humanoid robot, as well as, limbless locomotion in modular robotic configurations. Taking inspiration from biology, robot locomotion from the perspective of robot’s morphology, through evolution, and through learning are investigated in this thesis. Locomotion is one of the key distinguishing characteristics of a zoological organism. Almost all animal species, and even some plant species, produce some form of locomotion. In the past few years, robots have been “moving out” of the factory floor and research labs, and are becoming increasingly common in everyday life. So, providing stable and agile locomotion capabilities for robots to navigate a wide range of environments becomes pivotal. Developing locomotion in robots through biologically inspired methods, also facilitates furthering our understanding on how biological processes may function. Connected modules in a configuration, exert force on each other as a result of interaction between each other and their environment. This phenomenon is studied and quantified, and then used as implicit communication between robot modules for producing locomotion coordination in MRs. Through this, a strong link between robot morphology and the gait that emerge in it is established. A variety of locomotion controller, some periodic-function based and some morphology based, are developed for MR locomotion and bipedal gait generation. A hybrid Evolutionary Algorithm (EA) is implemented for evolving gaits, both in simulation as well as in the real-world on a physical modular robotic configuration. Limbless gaits in MRs are also learnt by learning optimal control policies, through Reinforcement Learning (RL).En robótica, la locomoción trata de proporcionar capacidades de locomoción autónoma a robots móviles. La mayoría de los robots actuales tiene alguna forma de locomoción para navegar en su entorno. Los modos de locomoción robótica se pueden repartir entre: (i) locomoción aérea, (ii) locomoción terrestre, y (iii) locomoción acuática (sobre o bajo el agua). Las tres formas básicas de locomoción terrestre son la locomoción mediante piernas, la locomoción sin miembros, y la locomoción basada en ruedas. Un Robot Modular, por otra parte, es un sistema robótico compuesto por varios módulos independientes, donde cada módulo es un robot en sí mismo. El objetivo de esta tesis es el desarrollo de la locomoción mediante piernas para un robot humanoide, así como el de la locomoción sin miembros para varias configuraciones de robots modulares. Inspirándose en la biología, también se investiga en esta tesis el desarrollo de la locomoción del robot según su morfología, gracias a técnicas de evolución y de aprendizaje. La locomoción es una de las características distintivas de un organismo zoológico. Casi todas las especies animales, e incluso algunas especies de plantas, poseen algún tipo de locomoción. En los últimos años, los robots han “migrado” desde las fábricas y los laboratorios de investigación, y se están integrando cada vez más en nuestra vida diaria. Por estas razones, es crucial proporcionar capacidades de locomoción estables y ágiles a los robots para que puedan navegar por todo tipo de entornos. El uso de métodos de inspiración biológica para alcanzar esta meta también nos ayuda a entender mejor cómo pueden funcionar los procesos biológicos equivalentes. En una configuración de módulos conectados, puesto que cada uno interacciona con su entorno, los módulos ejercen fuerza los unos sobre los otros. Este fenómeno se ha estudiado y cuantificado, y luego se ha usado como comunicación implícita entre los módulos para producir la coordinación en la locomoción de este robot. De esta manera, se establece un fuerte vínculo entre la morfología de un robot y el modo de andar que este desarrolla. Se han desarrollado varios controladores de locomoción para robots modulares y robots bípedos, algunos basados en funciones periódicas, otros en la morfología del robot. Un algoritmo evolutivo híbrido se ha implementado para la evolución de locomociones, tanto en simulación como en el mundo real en una configuración física de robot modular. También se pueden generar locomociones sin miembros para robots modulares, determinando las políticas de control óptimo gracias a técnicas de aprendizaje por refuerzo. Se presenta en primer lugar en esta tesis el estado del arte de la robótica modular, enfocándose en la locomoción de robots modulares, los controladores, la locomoción bípeda y la computación morfológica. A continuación se describen cinco configuraciones diferentes de robot modular que se utilizan en esta tesis, seguido de cuatro controladores de locomoción. Estos controladores son el controlador heterogéneo, el controlador basado en funciones periódicas, el controlador homogéneo y el controlador basado en la morfología del robot. Se desarrolla como parte de este trabajo un controlador de locomoción lineal, periódico, basado en features, para la locomoción bípeda de robots humanoides. Los parámetros de control se ajustan primero a mano para reproducir un modelo cart-table, y el controlador se evalúa en un robot humanoide simulado. A continuación, gracias a un algoritmo evolutivo, la optimización de los parámetros de control permite desarrollar una locomoción sin modelo predeterminado. Se desarrolla como parte de esta tesis un enfoque sobre algoritmos de Embodied Evolución, en otras palabras el uso de robots modulares físicos en la fase de evolución. La implementación material, la configuración experimental, y el Algoritmo Evolutivo implementado para Embodied Evolución, se explican detalladamente. El trabajo también incluye una visión general de las técnicas de aprendizaje por refuerzo y de los Procesos de Decisión de Markov. A continuación se presenta un algoritmo popular de aprendizaje por refuerzo, llamado Q-Learning, y su adaptación para aprender locomociones de robots modulares. Se proporcionan una implementación del algoritmo de aprendizaje y la evaluación experimental de la locomoción generada.Programa Oficial de Doctorado en Ingeniería Eléctrica, Electrónica y AutomáticaPresidente: Antonio Barrientos Cruz.- Secretario: Luis Santiago Garrido Bullón.- Vocal: Giuseppe Carbon

    Learning dynamic motor skills for terrestrial locomotion

    Get PDF
    The use of Deep Reinforcement Learning (DRL) has received significantly increased attention from researchers within the robotics field following the success of AlphaGo, which demonstrated the superhuman capabilities of deep reinforcement algorithms in terms of solving complex tasks by beating professional GO players. Since then, an increasing number of researchers have investigated the potential of using DRL to solve complex high-dimensional robotic tasks, such as legged locomotion, arm manipulation, and grasping, which are difficult tasks to solve using conventional optimization approaches. Understanding and recreating various modes of terrestrial locomotion has been of long-standing interest to roboticists. A large variety of applications, such as rescue missions, disaster responses and science expeditions, strongly demand mobility and versatility in legged locomotion to enable task completion. In order to create useful physical robots, it is necessary to design controllers to synthesize the complex locomotion behaviours observed in humans and other animals. In the past, legged locomotion was mainly achieved via analytical engineering approaches. However, conventional analytical approaches have their limitations, as they require relatively large amounts of human effort and knowledge. Machine learning approaches, such as DRL, require less human effort compared to analytical approaches. The project conducted for this thesis explores the feasibility of using DRL to acquire control policies comparable to, or better than, those acquired through analytical approaches while requiring less human effort. In this doctoral thesis, we developed a Multi-Expert Learning Architecture (MELA) that uses DRL to learn multi-skill control policies capable of synthesizing a diverse set of dynamic locomotion behaviours for legged robots. We first proposed a novel DRL framework for the locomotion of humanoid robots. The proposed learning framework is capable of acquiring robust and dynamic motor skills for humanoids, including balancing, walking, standing-up fall recovery. We subsequently improved upon the learning framework and design a novel multi-expert learning architecture that is capable of fusing multiple motor skills together in a seamless fashion and ultimately deploy this framework on a real quadrupedal robot. The successful deployment of learned control policies on a real quadrupedal robot demonstrates the feasibility of using an Artificial Intelligence (AI) based approach for real robot motion control
    corecore