Search CORE

7 research outputs found

Guided Constrained Policy Optimization for Dynamic Quadrupedal Robot Locomotion

Author: Gangapurwala Siddhant
Havoutis Ioannis
Mitchell Alexander
Publication venue
Publication date: 01/01/2020
Field of study

Deep reinforcement learning (RL) uses model-free techniques to optimize task-specific control policies. Despite having emerged as a promising approach for complex problems, RL is still hard to use reliably for real-world applications. Apart from challenges such as precise reward function tuning, inaccurate sensing and actuation, and non-deterministic response, existing RL methods do not guarantee behavior within required safety constraints that are crucial for real robot scenarios. In this regard, we introduce guided constrained policy optimization (GCPO), an RL framework based upon our implementation of constrained proximal policy optimization (CPPO) for tracking base velocity commands while following the defined constraints. We also introduce schemes which encourage state recovery into constrained regions in case of constraint violations. We present experimental results of our training method and test it on the real ANYmal quadruped robot. We compare our approach against the unconstrained RL method and show that guided constrained RL offers faster convergence close to the desired optimum resulting in an optimal, yet physically feasible, robotic control behavior without the need for precise reward function tuning.Comment: 8 pages, 8 figures, 5 tables, 1 algorithm, accepted to IEEE Robotics and Automation Letters (RA-L), January 2020 with presentation at International Conference on Robotics and Automation (ICRA) 202

arXiv.org e-Print Archive

Oxford University Research Archive

Roll-Drop: accounting for observation noise with a single parameter

Author: Campanaro Luigi
De Martini Daniele
Gangapurwala Siddhant
Havoutis Ioannis
Merkt Wolfgang
Publication venue
Publication date: 25/04/2023
Field of study

This paper proposes a simple strategy for sim-to-real in Deep-Reinforcement Learning (DRL) -- called Roll-Drop -- that uses dropout during simulation to account for observation noise during deployment without explicitly modelling its distribution for each state. DRL is a promising approach to control robots for highly dynamic and feedback-based manoeuvres, and accurate simulators are crucial to providing cheap and abundant data to learn the desired behaviour. Nevertheless, the simulated data are noiseless and generally show a distributional shift that challenges the deployment on real machines where sensor readings are affected by noise. The standard solution is modelling the latter and injecting it during training; while this requires a thorough system identification, Roll-Drop enhances the robustness to sensor noise by tuning only a single parameter. We demonstrate an 80% success rate when up to 25% noise is injected in the observations, with twice higher robustness than the baselines. We deploy the controller trained in simulation on a Unitree A1 platform and assess this improved robustness on the physical system.Comment: Accepted at Learning for Dynamics & Control Conference 2023 (L4DC), 10 pages, 7 figure

arXiv.org e-Print Archive

RLOC: Terrain-Aware Legged Locomotion using Reinforcement Learning and Optimal Control

Author: Fallon Maurice
Gangapurwala Siddhant
Geisert Mathieu
Havoutis Ioannis
Orsolino Romeo
Publication venue
Publication date: 05/12/2020
Field of study

We present a unified model-based and data-driven approach for quadrupedal planning and control to achieve dynamic locomotion over uneven terrain. We utilize on-board proprioceptive and exteroceptive feedback to map sensory information and desired base velocity commands into footstep plans using a reinforcement learning (RL) policy trained in simulation over a wide range of procedurally generated terrains. When ran online, the system tracks the generated footstep plans using a model-based controller. We evaluate the robustness of our method over a wide variety of complex terrains. It exhibits behaviors which prioritize stability over aggressive locomotion. Additionally, we introduce two ancillary RL policies for corrective whole-body motion tracking and recovery control. These policies account for changes in physical parameters and external perturbations. We train and evaluate our framework on a complex quadrupedal system, ANYmal version B, and demonstrate transferability to a larger and heavier robot, ANYmal C, without requiring retraining.Comment: 19 pages, 15 figures, 6 tables, 1 algorithm, submitted to T-RO; under revie

arXiv.org e-Print Archive

Oxford University Research Archive

Learning system adaptive legged robotic locomotion policies

Author: Gangapurwala Siddhant
Publication venue
Publication date: 07/04/2022
Field of study

The ability to form support contacts at discontinuous locations makes legged robots suitable for locomotion over highly unstructured terrains. While recent years have witnessed significant robotic developments, delivering extremely dynamic and robust hardware solutions, the control intelligence for legged robots to perform agile and sophisticated maneuvers remains an active area of research. This thesis, therefore, focuses on the control of legged systems, particularly, quadrupedal robots. The research presented in this thesis is driven by the motivation that a controller governing the behavior of a system should thoroughly utilize its potential while also adapting to variations in system dynamics through emergence of behavior that still achieves the control objective. Sampling-based search methods allow explorations along vast regions of operation, thereby, enabling discovery of near-optimal solutions. Similarly, data-driven reinforcement learning (RL) strategies allow development of controllers which exploit system dynamics to achieve the control objective described by a high-level reward function. The problem of legged robot locomotion can thus be approached using reinforcement learning to obtain robust and dynamic control solutions. Additionally, the control policy describing the behavior of the system can be parameterized as a deep neural network to perform complex non-linear mapping of the robot state information to desired control action. This approach of utilizing a deep neural network is then referred to as deep reinforcement learning. In this dissertation I focus on employing deep RL strategies for quadrupedal locomotion. I present that encouraging the RL control policy to even implicitly model system dynamics allows for emergence of adaptive control behaviors. I use this observation throughout this thesis to develop system adaptive control strategies leading up to an ambitious goal of obtaining an RL locomotion policy capable of zero-shot transfer to quadrupeds of varying kinematic and dynamic properties. Although the main contributions relate to RL, I also recognize the benefits and drawbacks of different control approaches, and thus explore modular control architectures which utilize both model-based and data-driven model-free strategies for robot locomotion. In this regard, I present training and control architectures which exhibit dynamic locomotion behavior over terrains with varying steps and inclines, both in lab experiments and field trials. In combination, the works presented in this thesis investigate, and, in my firm belief, further the state of legged robotic control advocating for development of artificial control intelligence which matches the level of complexity demonstrated by massively evolved biological counterparts

Rapid stability margin estimation for contact-rich locomotion

Author: Fallon Maurice
Gangapurwala Siddhant
Geisert Mathieu
Havoutis Ioannis
Melon Oliwier
Orsolino Romeo
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 16/12/2021
Field of study

The efficient evaluation the dynamic stability of legged robots on non-coplanar terrains is important when developing motion planning and control policies. The inference time of this measure has a strong influence on how fast a robot can react to unexpected events, plan its future footsteps or its body trajectory. Existing approaches suitable for real-time decision making are either limited to flat ground or to quasi-static locomotion. Furthermore, joint-space feasibility constraints are usually not considered in receding-horizon planning as their high dimensionality prohibits this. In this paper we propose the usage of a stability criterion for dynamic locomotion on rough terrain based on the Feasible Region (FR) and the Instantaneous Capture Point (ICP) and we leverage a Neural Network (NN) to quickly estimate it. We show that our network achieves satisfactory accuracy with respect to its analytical counterpart with a speed up of three orders-of-magnitude. It also enables the evaluation of the stability margin's gradient. We demonstrate this learned stability margin in two diverse applications - Reinforcement Learning (RL) and nonlinear Trajectory Optimization (TO) for legged robots. We demonstrate on a full-sized quadruped robot that the network enables the computation of physically-realizable Center of Mass (CoM) trajectories and foothold locations satisfying friction constraints and joint-torque limits in a receding-horizon fashion and on non-coplanar terrains

VAE-Loco: Versatile Quadruped Locomotion by Learning a Disentangled Gait Representation

Author: Engelcke Martin
Gangapurwala Siddhant
Geisert Mathieu
Havoutis Ioannis
Jones Oiwi Parker
Merkt Wolfgang
Mitchell Alexander L.
Posner Ingmar
Publication venue
Publication date: 02/05/2022
Field of study

Quadruped locomotion is rapidly maturing to a degree where robots now routinely traverse a variety of unstructured terrains. However, while gaits can be varied typically by selecting from a range of pre-computed styles, current planners are unable to vary key gait parameters continuously while the robot is in motion. The synthesis, on-the-fly, of gaits with unexpected operational characteristics or even the blending of dynamic manoeuvres lies beyond the capabilities of the current state-of-the-art. In this work we address this limitation by learning a latent space capturing the key stance phases constituting a particular gait. This is achieved via a generative model trained on a single trot style, which encourages disentanglement such that application of a drive signal to a single dimension of the latent state induces holistic plans synthesising a continuous variety of trot styles. We demonstrate that specific properties of the drive signal map directly to gait parameters such as cadence, footstep height and full stance duration. Due to the nature of our approach these synthesised gaits are continuously variable online during robot operation and robustly capture a richness of movement significantly exceeding the relatively narrow behaviour seen during training. In addition, the use of a generative model facilitates the detection and mitigation of disturbances to provide a versatile and robust planning framework. We evaluate our approach on two versions of the real ANYmal quadruped robots and demonstrate that our method achieves a continuous blend of dynamic trot styles whilst being robust and reactive to external perturbations.Comment: 15 pages, 13 figures, 1 table, submitted to IEEE Transactions on Robotics (T-RO). arXiv admin note: substantial text overlap with arXiv:2112.0480

arXiv.org e-Print Archive