7 research outputs found
Guided Constrained Policy Optimization for Dynamic Quadrupedal Robot Locomotion
Deep reinforcement learning (RL) uses model-free techniques to optimize
task-specific control policies. Despite having emerged as a promising approach
for complex problems, RL is still hard to use reliably for real-world
applications. Apart from challenges such as precise reward function tuning,
inaccurate sensing and actuation, and non-deterministic response, existing RL
methods do not guarantee behavior within required safety constraints that are
crucial for real robot scenarios. In this regard, we introduce guided
constrained policy optimization (GCPO), an RL framework based upon our
implementation of constrained proximal policy optimization (CPPO) for tracking
base velocity commands while following the defined constraints. We also
introduce schemes which encourage state recovery into constrained regions in
case of constraint violations. We present experimental results of our training
method and test it on the real ANYmal quadruped robot. We compare our approach
against the unconstrained RL method and show that guided constrained RL offers
faster convergence close to the desired optimum resulting in an optimal, yet
physically feasible, robotic control behavior without the need for precise
reward function tuning.Comment: 8 pages, 8 figures, 5 tables, 1 algorithm, accepted to IEEE Robotics
and Automation Letters (RA-L), January 2020 with presentation at
International Conference on Robotics and Automation (ICRA) 202
Roll-Drop: accounting for observation noise with a single parameter
This paper proposes a simple strategy for sim-to-real in Deep-Reinforcement
Learning (DRL) -- called Roll-Drop -- that uses dropout during simulation to
account for observation noise during deployment without explicitly modelling
its distribution for each state. DRL is a promising approach to control robots
for highly dynamic and feedback-based manoeuvres, and accurate simulators are
crucial to providing cheap and abundant data to learn the desired behaviour.
Nevertheless, the simulated data are noiseless and generally show a
distributional shift that challenges the deployment on real machines where
sensor readings are affected by noise. The standard solution is modelling the
latter and injecting it during training; while this requires a thorough system
identification, Roll-Drop enhances the robustness to sensor noise by tuning
only a single parameter. We demonstrate an 80% success rate when up to 25%
noise is injected in the observations, with twice higher robustness than the
baselines. We deploy the controller trained in simulation on a Unitree A1
platform and assess this improved robustness on the physical system.Comment: Accepted at Learning for Dynamics & Control Conference 2023 (L4DC),
10 pages, 7 figure
RLOC: Terrain-Aware Legged Locomotion using Reinforcement Learning and Optimal Control
We present a unified model-based and data-driven approach for quadrupedal
planning and control to achieve dynamic locomotion over uneven terrain. We
utilize on-board proprioceptive and exteroceptive feedback to map sensory
information and desired base velocity commands into footstep plans using a
reinforcement learning (RL) policy trained in simulation over a wide range of
procedurally generated terrains. When ran online, the system tracks the
generated footstep plans using a model-based controller. We evaluate the
robustness of our method over a wide variety of complex terrains. It exhibits
behaviors which prioritize stability over aggressive locomotion. Additionally,
we introduce two ancillary RL policies for corrective whole-body motion
tracking and recovery control. These policies account for changes in physical
parameters and external perturbations. We train and evaluate our framework on a
complex quadrupedal system, ANYmal version B, and demonstrate transferability
to a larger and heavier robot, ANYmal C, without requiring retraining.Comment: 19 pages, 15 figures, 6 tables, 1 algorithm, submitted to T-RO; under
revie
Learning system adaptive legged robotic locomotion policies
The ability to form support contacts at discontinuous locations makes legged robots suitable for locomotion over highly unstructured terrains. While recent years have witnessed significant robotic developments, delivering extremely dynamic and robust hardware solutions, the control intelligence for legged robots to perform agile and sophisticated maneuvers remains an active area of research. This thesis, therefore, focuses on the control of legged systems, particularly, quadrupedal robots.
The research presented in this thesis is driven by the motivation that a controller governing the behavior of a system should thoroughly utilize its potential while also adapting to variations in system dynamics through emergence of behavior that still achieves the control objective.
Sampling-based search methods allow explorations along vast regions of operation, thereby, enabling discovery of near-optimal solutions. Similarly, data-driven reinforcement learning (RL) strategies allow development of controllers which exploit system dynamics to achieve the control objective described by a high-level reward function. The problem of legged robot locomotion can thus be approached using reinforcement learning to obtain robust and dynamic control solutions. Additionally, the control policy describing the behavior of the system can be parameterized as a deep neural network to perform complex non-linear mapping of the robot state information to desired control action. This approach of utilizing a deep neural network is then referred to as deep reinforcement learning. In this dissertation I focus on employing deep RL strategies for quadrupedal locomotion.
I present that encouraging the RL control policy to even implicitly model system dynamics allows for emergence of adaptive control behaviors. I use this observation throughout this thesis to develop system adaptive control strategies leading up to an ambitious goal of obtaining an RL locomotion policy capable of zero-shot transfer to quadrupeds of varying kinematic and dynamic properties.
Although the main contributions relate to RL, I also recognize the benefits and drawbacks of different control approaches, and thus explore modular control architectures which utilize both model-based and data-driven model-free strategies for robot locomotion. In this regard, I present training and control architectures which exhibit dynamic locomotion behavior over terrains with varying steps and inclines, both in lab experiments and field trials.
In combination, the works presented in this thesis investigate, and, in my firm belief, further the state of legged robotic control advocating for development of artificial control intelligence which matches the level of complexity demonstrated by massively evolved biological counterparts
Rapid stability margin estimation for contact-rich locomotion
The efficient evaluation the dynamic stability of legged robots on non-coplanar terrains is important when developing motion planning and control policies. The inference time of this measure has a strong influence on how fast a robot can react to unexpected events, plan its future footsteps or its body trajectory. Existing approaches suitable for real-time decision making are either limited to flat ground or to quasi-static locomotion. Furthermore, joint-space feasibility constraints are usually not considered in receding-horizon planning as their high dimensionality prohibits this. In this paper we propose the usage of a stability criterion for dynamic locomotion on rough terrain based on the Feasible Region (FR) and the Instantaneous Capture Point (ICP) and we leverage a Neural Network (NN) to quickly estimate it. We show that our network achieves satisfactory accuracy with respect to its analytical counterpart with a speed up of three orders-of-magnitude. It also enables the evaluation of the stability margin's gradient. We demonstrate this learned stability margin in two diverse applications - Reinforcement Learning (RL) and nonlinear Trajectory Optimization (TO) for legged robots. We demonstrate on a full-sized quadruped robot that the network enables the computation of physically-realizable Center of Mass (CoM) trajectories and foothold locations satisfying friction constraints and joint-torque limits in a receding-horizon fashion and on non-coplanar terrains
VAE-Loco: Versatile Quadruped Locomotion by Learning a Disentangled Gait Representation
Quadruped locomotion is rapidly maturing to a degree where robots now
routinely traverse a variety of unstructured terrains. However, while gaits can
be varied typically by selecting from a range of pre-computed styles, current
planners are unable to vary key gait parameters continuously while the robot is
in motion. The synthesis, on-the-fly, of gaits with unexpected operational
characteristics or even the blending of dynamic manoeuvres lies beyond the
capabilities of the current state-of-the-art. In this work we address this
limitation by learning a latent space capturing the key stance phases
constituting a particular gait. This is achieved via a generative model trained
on a single trot style, which encourages disentanglement such that application
of a drive signal to a single dimension of the latent state induces holistic
plans synthesising a continuous variety of trot styles. We demonstrate that
specific properties of the drive signal map directly to gait parameters such as
cadence, footstep height and full stance duration. Due to the nature of our
approach these synthesised gaits are continuously variable online during robot
operation and robustly capture a richness of movement significantly exceeding
the relatively narrow behaviour seen during training. In addition, the use of a
generative model facilitates the detection and mitigation of disturbances to
provide a versatile and robust planning framework. We evaluate our approach on
two versions of the real ANYmal quadruped robots and demonstrate that our
method achieves a continuous blend of dynamic trot styles whilst being robust
and reactive to external perturbations.Comment: 15 pages, 13 figures, 1 table, submitted to IEEE Transactions on
Robotics (T-RO). arXiv admin note: substantial text overlap with
arXiv:2112.0480