25 research outputs found
Bound Controller for a Quadruped Robot using Pre-Fitting Deep Reinforcement Learning
The bound gait is an important gait in quadruped robot locomotion. It can be
used to cross obstacles and often serves as transition mode between trot and
gallop. However, because of the complexity of the models, the bound gait built
by the conventional control method is often unnatural and slow to compute. In
the present work, we introduce a method to achieve the bound gait based on
model-free pre-fit deep reinforcement learning (PF-DRL). We first constructed a
net with the same structure as an actor net in the PPO2 and pre-fit it using
the data collected from a robot using conventional model-based controller.
Next, the trained weights are transferred into the PPO2 and be optimized
further. Moreover, target on the symmetrical and periodic characteristic during
bounding, we designed a reward function based on contact points. We also used
feature engineering to improve the input features of the DRL model and improve
performance on flat ground. Finally, we trained the bound controller in
simulation and successfully deployed it on the Jueying Mini robot. It performs
better than the conventional method with higher computational efficiency and
more stable center-of-mass height in our experiments.Comment: 7page
Flightmare: A Flexible Quadrotor Simulator
Currently available quadrotor simulators have a rigid and highly-specialized
structure: either are they really fast, physically accurate, or
photo-realistic. In this work, we propose a paradigm-shift in the development
of simulators: moving the trade-off between accuracy and speed from the
developers to the end-users. We use this design idea to develop a novel modular
quadrotor simulator: Flightmare. Flightmare is composed of two main components:
a configurable rendering engine built on Unity and a flexible physics engine
for dynamics simulation. Those two components are totally decoupled and can run
independently from each other. This makes our simulator extremely fast:
rendering achieves speeds of up to 230 Hz, while physics simulation of up to
200,000 Hz. In addition, Flightmare comes with several desirable features: (i)
a large multi-modal sensor suite, including an interface to extract the 3D
point-cloud of the scene; (ii) an API for reinforcement learning which can
simulate hundreds of quadrotors in parallel; and (iii) an integration with a
virtual-reality headset for interaction with the simulated environment. We
demonstrate the flexibility of Flightmare by using it for two completely
different robotic tasks: learning a sensorimotor control policy for a quadrotor
and path-planning in a complex 3D environment
Gym-Ignition: Reproducible Robotic Simulations for Reinforcement Learning
This paper presents Gym-Ignition, a new framework to create reproducible
robotic environments for reinforcement learning research. It interfaces with
the new generation of Gazebo, part of the Ignition Robotics suite, which
provides three main improvements for reinforcement learning applications
compared to the alternatives: 1) the modular architecture enables using the
simulator as a C++ library, simplifying the interconnection with external
software; 2) multiple physics and rendering engines are supported as plugins,
simplifying their selection during the execution; 3) the new distributed
simulation capability allows simulating complex scenarios while sharing the
load on multiple workers and machines. The core of Gym-Ignition is a component
that contains the Ignition Gazebo simulator and exposes a simple interface for
its configuration and execution. We provide a Python package that allows
developers to create robotic environments simulated in Ignition Gazebo.
Environments expose the common OpenAI Gym interface, making them compatible
out-of-the-box with third-party frameworks containing reinforcement learning
algorithms. Simulations can be executed in both headless and GUI mode, the
physics engine can run in accelerated mode, and instances can be parallelized.
Furthermore, the Gym-Ignition software architecture provides abstraction of the
Robot and the Task, making environments agnostic on the specific runtime. This
abstraction allows their execution also in a real-time setting on actual
robotic platforms, even if driven by different middlewares.Comment: Accepted in SII202
Guided Constrained Policy Optimization for Dynamic Quadrupedal Robot Locomotion
Deep reinforcement learning (RL) uses model-free techniques to optimize
task-specific control policies. Despite having emerged as a promising approach
for complex problems, RL is still hard to use reliably for real-world
applications. Apart from challenges such as precise reward function tuning,
inaccurate sensing and actuation, and non-deterministic response, existing RL
methods do not guarantee behavior within required safety constraints that are
crucial for real robot scenarios. In this regard, we introduce guided
constrained policy optimization (GCPO), an RL framework based upon our
implementation of constrained proximal policy optimization (CPPO) for tracking
base velocity commands while following the defined constraints. We also
introduce schemes which encourage state recovery into constrained regions in
case of constraint violations. We present experimental results of our training
method and test it on the real ANYmal quadruped robot. We compare our approach
against the unconstrained RL method and show that guided constrained RL offers
faster convergence close to the desired optimum resulting in an optimal, yet
physically feasible, robotic control behavior without the need for precise
reward function tuning.Comment: 8 pages, 8 figures, 5 tables, 1 algorithm, accepted to IEEE Robotics
and Automation Letters (RA-L), January 2020 with presentation at
International Conference on Robotics and Automation (ICRA) 202