289 research outputs found
Learning and Adapting Agile Locomotion Skills by Transferring Experience
Legged robots have enormous potential in their range of capabilities, from
navigating unstructured terrains to high-speed running. However, designing
robust controllers for highly agile dynamic motions remains a substantial
challenge for roboticists. Reinforcement learning (RL) offers a promising
data-driven approach for automatically training such controllers. However,
exploration in these high-dimensional, underactuated systems remains a
significant hurdle for enabling legged robots to learn performant,
naturalistic, and versatile agility skills. We propose a framework for training
complex robotic skills by transferring experience from existing controllers to
jumpstart learning new tasks. To leverage controllers we can acquire in
practice, we design this framework to be flexible in terms of their source --
that is, the controllers may have been optimized for a different objective
under different dynamics, or may require different knowledge of the
surroundings -- and thus may be highly suboptimal for the target task. We show
that our method enables learning complex agile jumping behaviors, navigating to
goal locations while walking on hind legs, and adapting to new environments. We
also demonstrate that the agile behaviors learned in this way are graceful and
safe enough to deploy in the real world.Comment: Project website: https://sites.google.com/berkeley.edu/twir
Locomoção de humanoides robusta e versátil baseada em controlo analĂtico e fĂsica residual
Humanoid robots are made to resemble humans but their locomotion
abilities are far from ours in terms of agility and versatility. When humans
walk on complex terrains or face external disturbances, they
combine a set of strategies, unconsciously and efficiently, to regain
stability. This thesis tackles the problem of developing a robust omnidirectional
walking framework, which is able to generate versatile
and agile locomotion on complex terrains. We designed and developed
model-based and model-free walk engines and formulated the
controllers using different approaches including classical and optimal
control schemes and validated their performance through simulations
and experiments. These frameworks have hierarchical structures that
are composed of several layers. These layers are composed of several
modules that are connected together to fade the complexity and
increase the flexibility of the proposed frameworks. Additionally, they
can be easily and quickly deployed on different platforms.
Besides, we believe that using machine learning on top of analytical approaches
is a key to open doors for humanoid robots to step out of laboratories.
We proposed a tight coupling between analytical control and
deep reinforcement learning. We augmented our analytical controller
with reinforcement learning modules to learn how to regulate the walk
engine parameters (planners and controllers) adaptively and generate
residuals to adjust the robot’s target joint positions (residual physics).
The effectiveness of the proposed frameworks was demonstrated and
evaluated across a set of challenging simulation scenarios. The robot
was able to generalize what it learned in one scenario, by displaying
human-like locomotion skills in unforeseen circumstances, even in the
presence of noise and external pushes.Os robĂ´s humanoides sĂŁo feitos para se parecerem com humanos,
mas suas habilidades de locomoção estão longe das nossas em termos
de agilidade e versatilidade. Quando os humanos caminham em
terrenos complexos ou enfrentam distĂşrbios externos combinam diferentes
estratégias, de forma inconsciente e eficiente, para recuperar a
estabilidade. Esta tese aborda o problema de desenvolver um sistema
robusto para andar de forma omnidirecional, capaz de gerar uma locomoção
para robôs humanoides versátil e ágil em terrenos complexos.
Projetámos e desenvolvemos motores de locomoção sem modelos e
baseados em modelos. Formulámos os controladores usando diferentes
abordagens, incluindo esquemas de controlo clássicos e ideais,
e validámos o seu desempenho por meio de simulações e experiências
reais. Estes frameworks têm estruturas hierárquicas compostas por
várias camadas. Essas camadas são compostas por vários módulos
que sĂŁo conectados entre si para diminuir a complexidade e aumentar
a flexibilidade dos frameworks propostos. Adicionalmente, o sistema
pode ser implementado em diferentes plataformas de forma fácil.
Acreditamos que o uso de aprendizagem automática sobre abordagens
analĂticas Ă© a chave para abrir as portas para robĂ´s humanoides
saĂrem dos laboratĂłrios. Propusemos um forte acoplamento entre controlo
analĂtico e aprendizagem profunda por reforço. Expandimos o
nosso controlador analĂtico com mĂłdulos de aprendizagem por reforço
para aprender como regular os parâmetros do motor de caminhada
(planeadores e controladores) de forma adaptativa e gerar resĂduos
para ajustar as posições das juntas alvo do robĂ´ (fĂsica residual). A
eficácia das estruturas propostas foi demonstrada e avaliada em um
conjunto de cenários de simulação desafiadores. O robô foi capaz de
generalizar o que aprendeu em um cenário, exibindo habilidades de
locomoção humanas em circunstâncias imprevistas, mesmo na presença
de ruĂdo e impulsos externos.Programa Doutoral em Informátic
3LP: a linear 3D-walking model including torso and swing dynamics
In this paper, we present a new model of biped locomotion which is composed
of three linear pendulums (one per leg and one for the whole upper body) to
describe stance, swing and torso dynamics. In addition to double support, this
model has different actuation possibilities in the swing hip and stance ankle
which could be widely used to produce different walking gaits. Without the need
for numerical time-integration, closed-form solutions help finding periodic
gaits which could be simply scaled in certain dimensions to modulate the motion
online. Thanks to linearity properties, the proposed model can provide a
computationally fast platform for model predictive controllers to predict the
future and consider meaningful inequality constraints to ensure feasibility of
the motion. Such property is coming from describing dynamics with joint torques
directly and therefore, reflecting hardware limitations more precisely, even in
the very abstract high level template space. The proposed model produces
human-like torque and ground reaction force profiles and thus, compared to
point-mass models, it is more promising for precise control of humanoid robots.
Despite being linear and lacking many other features of human walking like CoM
excursion, knee flexion and ground clearance, we show that the proposed model
can predict one of the main optimality trends in human walking, i.e. nonlinear
speed-frequency relationship. In this paper, we mainly focus on describing the
model and its capabilities, comparing it with human data and calculating
optimal human gait variables. Setting up control problems and advanced
biomechanical analysis still remain for future works.Comment: Journal paper under revie
Grow Your Limits: Continuous Improvement with Real-World RL for Robotic Locomotion
Deep reinforcement learning (RL) can enable robots to autonomously acquire
complex behaviors, such as legged locomotion. However, RL in the real world is
complicated by constraints on efficiency, safety, and overall training
stability, which limits its practical applicability. We present APRL, a policy
regularization framework that modulates the robot's exploration over the course
of training, striking a balance between flexible improvement potential and
focused, efficient exploration. APRL enables a quadrupedal robot to efficiently
learn to walk entirely in the real world within minutes and continue to improve
with more training where prior work saturates in performance. We demonstrate
that continued training with APRL results in a policy that is substantially
more capable of navigating challenging situations and is able to adapt to
changes in dynamics with continued training.Comment: First two authors contributed equally. Project website:
https://sites.google.com/berkeley.edu/apr
Bringing a Humanoid Robot Closer to Human Versatility : Hard Realtime Software Architecture and Deep Learning Based Tactile Sensing
For centuries, it has been a vision of man to create humanoid robots, i.e., machines that not only resemble the shape of the human body, but have similar capabilities, especially in dextrously manipulating their environment. But only in recent years it has been possible to build actual humanoid robots with many degrees of freedom (DOF) and equipped with torque controlled joints, which are a prerequisite for sensitively acting in the world. In this thesis, we extend DLR's advanced mobile torque controlled humanoid robot Agile Justin into two important directions to get closer to human versatility. First, we enable Agile Justin, which was originally built as a research platform for dextrous mobile manipulation, to also be able to execute complex dynamic manipulation tasks. We demonstrate this with the challenging task of catching up to two simultaneously thrown balls with its hands. Second, we equip Agile Justin with highly developed and deep learning based tactile sensing capabilities that are critical for dextrous fine manipulation. We demonstrate its tactile capabilities with the delicate task of identifying an objects material simply by gently sweeping with a fingertip over its surface. Key for the realization of complex dynamic manipulation tasks is a software framework that allows for a component based system architecture to cope with the complexity and parallel and distributed computational demands of deep sensor-perception-planning-action loops -- but under tight timing constraints. This thesis presents the communication layer of our aRDx (agile robot development -- next generation) software framework that provides hard realtime determinism and optimal transport of data packets with zero-copy for intra- and inter-process and copy-once for distributed communication. In the implementation of the challenging ball catching application on Agile Justin, we take full advantage of aRDx's performance and advanced features like channel synchronization. Besides developing the challenging visual ball tracking using only onboard sensing while everything is moving and the automatic and self-contained calibration procedure to provide the necessary precision, the major contribution is the unified generation of the reaching motion for the arms. The catch point selection, motion planning and the joint interpolation steps are subsumed in one nonlinear constrained optimization problem which is solved in realtime and allows for the realization of different catch behaviors. For the highly sensitive task of tactile material classification with a flexible pressure-sensitive skin on Agile Justin's fingertip, we present our deep convolutional network architecture TactNet-II. The input is the raw 16000 dimensional complex and noisy spatio-temporal tactile signal generated when sweeping over an object's surface. For comparison, we perform a thorough human performance experiment with 15 subjects which shows that Agile Justin reaches superhuman performance in the high-level material classification task (What material id?), as well as in the low-level material differentiation task (Are two materials the same?). To increase the sample efficiency of TactNet-II, we adapt state of the art deep end-to-end transfer learning to tactile material classification leading to an up to 15 fold reduction in the number of training samples needed. The presented methods led to six publication awards and award finalists and international media coverage but also worked robustly at many trade fairs and lab demos
- …