289 research outputs found

    Learning and Adapting Agile Locomotion Skills by Transferring Experience

    Full text link
    Legged robots have enormous potential in their range of capabilities, from navigating unstructured terrains to high-speed running. However, designing robust controllers for highly agile dynamic motions remains a substantial challenge for roboticists. Reinforcement learning (RL) offers a promising data-driven approach for automatically training such controllers. However, exploration in these high-dimensional, underactuated systems remains a significant hurdle for enabling legged robots to learn performant, naturalistic, and versatile agility skills. We propose a framework for training complex robotic skills by transferring experience from existing controllers to jumpstart learning new tasks. To leverage controllers we can acquire in practice, we design this framework to be flexible in terms of their source -- that is, the controllers may have been optimized for a different objective under different dynamics, or may require different knowledge of the surroundings -- and thus may be highly suboptimal for the target task. We show that our method enables learning complex agile jumping behaviors, navigating to goal locations while walking on hind legs, and adapting to new environments. We also demonstrate that the agile behaviors learned in this way are graceful and safe enough to deploy in the real world.Comment: Project website: https://sites.google.com/berkeley.edu/twir

    Locomoção de humanoides robusta e versátil baseada em controlo analítico e física residual

    Get PDF
    Humanoid robots are made to resemble humans but their locomotion abilities are far from ours in terms of agility and versatility. When humans walk on complex terrains or face external disturbances, they combine a set of strategies, unconsciously and efficiently, to regain stability. This thesis tackles the problem of developing a robust omnidirectional walking framework, which is able to generate versatile and agile locomotion on complex terrains. We designed and developed model-based and model-free walk engines and formulated the controllers using different approaches including classical and optimal control schemes and validated their performance through simulations and experiments. These frameworks have hierarchical structures that are composed of several layers. These layers are composed of several modules that are connected together to fade the complexity and increase the flexibility of the proposed frameworks. Additionally, they can be easily and quickly deployed on different platforms. Besides, we believe that using machine learning on top of analytical approaches is a key to open doors for humanoid robots to step out of laboratories. We proposed a tight coupling between analytical control and deep reinforcement learning. We augmented our analytical controller with reinforcement learning modules to learn how to regulate the walk engine parameters (planners and controllers) adaptively and generate residuals to adjust the robot’s target joint positions (residual physics). The effectiveness of the proposed frameworks was demonstrated and evaluated across a set of challenging simulation scenarios. The robot was able to generalize what it learned in one scenario, by displaying human-like locomotion skills in unforeseen circumstances, even in the presence of noise and external pushes.Os robôs humanoides são feitos para se parecerem com humanos, mas suas habilidades de locomoção estão longe das nossas em termos de agilidade e versatilidade. Quando os humanos caminham em terrenos complexos ou enfrentam distúrbios externos combinam diferentes estratégias, de forma inconsciente e eficiente, para recuperar a estabilidade. Esta tese aborda o problema de desenvolver um sistema robusto para andar de forma omnidirecional, capaz de gerar uma locomoção para robôs humanoides versátil e ágil em terrenos complexos. Projetámos e desenvolvemos motores de locomoção sem modelos e baseados em modelos. Formulámos os controladores usando diferentes abordagens, incluindo esquemas de controlo clássicos e ideais, e validámos o seu desempenho por meio de simulações e experiências reais. Estes frameworks têm estruturas hierárquicas compostas por várias camadas. Essas camadas são compostas por vários módulos que são conectados entre si para diminuir a complexidade e aumentar a flexibilidade dos frameworks propostos. Adicionalmente, o sistema pode ser implementado em diferentes plataformas de forma fácil. Acreditamos que o uso de aprendizagem automática sobre abordagens analíticas é a chave para abrir as portas para robôs humanoides saírem dos laboratórios. Propusemos um forte acoplamento entre controlo analítico e aprendizagem profunda por reforço. Expandimos o nosso controlador analítico com módulos de aprendizagem por reforço para aprender como regular os parâmetros do motor de caminhada (planeadores e controladores) de forma adaptativa e gerar resíduos para ajustar as posições das juntas alvo do robô (física residual). A eficácia das estruturas propostas foi demonstrada e avaliada em um conjunto de cenários de simulação desafiadores. O robô foi capaz de generalizar o que aprendeu em um cenário, exibindo habilidades de locomoção humanas em circunstâncias imprevistas, mesmo na presença de ruído e impulsos externos.Programa Doutoral em Informátic

    3LP: a linear 3D-walking model including torso and swing dynamics

    Get PDF
    In this paper, we present a new model of biped locomotion which is composed of three linear pendulums (one per leg and one for the whole upper body) to describe stance, swing and torso dynamics. In addition to double support, this model has different actuation possibilities in the swing hip and stance ankle which could be widely used to produce different walking gaits. Without the need for numerical time-integration, closed-form solutions help finding periodic gaits which could be simply scaled in certain dimensions to modulate the motion online. Thanks to linearity properties, the proposed model can provide a computationally fast platform for model predictive controllers to predict the future and consider meaningful inequality constraints to ensure feasibility of the motion. Such property is coming from describing dynamics with joint torques directly and therefore, reflecting hardware limitations more precisely, even in the very abstract high level template space. The proposed model produces human-like torque and ground reaction force profiles and thus, compared to point-mass models, it is more promising for precise control of humanoid robots. Despite being linear and lacking many other features of human walking like CoM excursion, knee flexion and ground clearance, we show that the proposed model can predict one of the main optimality trends in human walking, i.e. nonlinear speed-frequency relationship. In this paper, we mainly focus on describing the model and its capabilities, comparing it with human data and calculating optimal human gait variables. Setting up control problems and advanced biomechanical analysis still remain for future works.Comment: Journal paper under revie

    Grow Your Limits: Continuous Improvement with Real-World RL for Robotic Locomotion

    Full text link
    Deep reinforcement learning (RL) can enable robots to autonomously acquire complex behaviors, such as legged locomotion. However, RL in the real world is complicated by constraints on efficiency, safety, and overall training stability, which limits its practical applicability. We present APRL, a policy regularization framework that modulates the robot's exploration over the course of training, striking a balance between flexible improvement potential and focused, efficient exploration. APRL enables a quadrupedal robot to efficiently learn to walk entirely in the real world within minutes and continue to improve with more training where prior work saturates in performance. We demonstrate that continued training with APRL results in a policy that is substantially more capable of navigating challenging situations and is able to adapt to changes in dynamics with continued training.Comment: First two authors contributed equally. Project website: https://sites.google.com/berkeley.edu/apr

    Bringing a Humanoid Robot Closer to Human Versatility : Hard Realtime Software Architecture and Deep Learning Based Tactile Sensing

    Get PDF
    For centuries, it has been a vision of man to create humanoid robots, i.e., machines that not only resemble the shape of the human body, but have similar capabilities, especially in dextrously manipulating their environment. But only in recent years it has been possible to build actual humanoid robots with many degrees of freedom (DOF) and equipped with torque controlled joints, which are a prerequisite for sensitively acting in the world. In this thesis, we extend DLR's advanced mobile torque controlled humanoid robot Agile Justin into two important directions to get closer to human versatility. First, we enable Agile Justin, which was originally built as a research platform for dextrous mobile manipulation, to also be able to execute complex dynamic manipulation tasks. We demonstrate this with the challenging task of catching up to two simultaneously thrown balls with its hands. Second, we equip Agile Justin with highly developed and deep learning based tactile sensing capabilities that are critical for dextrous fine manipulation. We demonstrate its tactile capabilities with the delicate task of identifying an objects material simply by gently sweeping with a fingertip over its surface. Key for the realization of complex dynamic manipulation tasks is a software framework that allows for a component based system architecture to cope with the complexity and parallel and distributed computational demands of deep sensor-perception-planning-action loops -- but under tight timing constraints. This thesis presents the communication layer of our aRDx (agile robot development -- next generation) software framework that provides hard realtime determinism and optimal transport of data packets with zero-copy for intra- and inter-process and copy-once for distributed communication. In the implementation of the challenging ball catching application on Agile Justin, we take full advantage of aRDx's performance and advanced features like channel synchronization. Besides developing the challenging visual ball tracking using only onboard sensing while everything is moving and the automatic and self-contained calibration procedure to provide the necessary precision, the major contribution is the unified generation of the reaching motion for the arms. The catch point selection, motion planning and the joint interpolation steps are subsumed in one nonlinear constrained optimization problem which is solved in realtime and allows for the realization of different catch behaviors. For the highly sensitive task of tactile material classification with a flexible pressure-sensitive skin on Agile Justin's fingertip, we present our deep convolutional network architecture TactNet-II. The input is the raw 16000 dimensional complex and noisy spatio-temporal tactile signal generated when sweeping over an object's surface. For comparison, we perform a thorough human performance experiment with 15 subjects which shows that Agile Justin reaches superhuman performance in the high-level material classification task (What material id?), as well as in the low-level material differentiation task (Are two materials the same?). To increase the sample efficiency of TactNet-II, we adapt state of the art deep end-to-end transfer learning to tactile material classification leading to an up to 15 fold reduction in the number of training samples needed. The presented methods led to six publication awards and award finalists and international media coverage but also worked robustly at many trade fairs and lab demos
    • …
    corecore