309 research outputs found
Proximal Policy Gradient Arborescence for Quality Diversity Reinforcement Learning
Training generally capable agents that perform well in unseen dynamic
environments is a long-term goal of robot learning. Quality Diversity
Reinforcement Learning (QD-RL) is an emerging class of reinforcement learning
(RL) algorithms that blend insights from Quality Diversity (QD) and RL to
produce a collection of high performing and behaviorally diverse policies with
respect to a behavioral embedding. Existing QD-RL approaches have thus far
taken advantage of sample-efficient off-policy RL algorithms. However, recent
advances in high-throughput, massively parallelized robotic simulators have
opened the door for algorithms that can take advantage of such parallelism, and
it is unclear how to scale existing off-policy QD-RL methods to these new
data-rich regimes. In this work, we take the first steps to combine on-policy
RL methods, specifically Proximal Policy Optimization (PPO), that can leverage
massive parallelism, with QD, and propose a new QD-RL method with these
high-throughput simulators and on-policy training in mind. Our proposed
Proximal Policy Gradient Arborescence (PPGA) algorithm yields a 4x improvement
over baselines on the challenging humanoid domain.Comment: Submitted to Neurips 202
Model predictive control-based value estimation for efficient reinforcement learning
Reinforcement learning suffers from limitations in real practices primarily
due to the numbers of required interactions with virtual environments. It
results in a challenging problem that we are implausible to obtain an optimal
strategy only with a few attempts for many learning method. Hereby, we design
an improved reinforcement learning method based on model predictive control
that models the environment through a data-driven approach. Based on learned
environmental model, it performs multi-step prediction to estimate the value
function and optimize the policy. The method demonstrates higher learning
efficiency, faster convergent speed of strategies tending to the optimal value,
and fewer sample capacity space required by experience replay buffers.
Experimental results, both in classic databases and in a dynamic obstacle
avoidance scenario for unmanned aerial vehicle, validate the proposed
approaches
Virtual Robot Climbing using Reinforcement Learning
Reinforcement Learning (RL) is a field of Artificial Intelligence that has gained a lot of attention in recent years. In this project, RL research was used to design and train an agent to climb and navigate through an environment with slopes. We compared and evaluated the performance of two state-of-the-art reinforcement learning algorithms for locomotion related tasks, Deep Deterministic Policy Gradients (DDPG) and Trust Region Policy Optimisation (TRPO). We observed that, on an average, training with TRPO was three times faster than DDPG, and also much more stable for the locomotion control tasks that we experimented. We conducted experiments and finally designed an environment using insights from transfer learning to successfully train an agent to climb slopes up to 36°
Learning to Run challenge solutions: Adapting reinforcement learning methods for neuromusculoskeletal environments
In the NIPS 2017 Learning to Run challenge, participants were tasked with
building a controller for a musculoskeletal model to make it run as fast as
possible through an obstacle course. Top participants were invited to describe
their algorithms. In this work, we present eight solutions that used deep
reinforcement learning approaches, based on algorithms such as Deep
Deterministic Policy Gradient, Proximal Policy Optimization, and Trust Region
Policy Optimization. Many solutions use similar relaxations and heuristics,
such as reward shaping, frame skipping, discretization of the action space,
symmetry, and policy blending. However, each of the eight teams implemented
different modifications of the known algorithms.Comment: 27 pages, 17 figure
Learning Agility and Adaptive Legged Locomotion via Curricular Hindsight Reinforcement Learning
Agile and adaptive maneuvers such as fall recovery, high-speed turning, and
sprinting in the wild are challenging for legged systems. We propose a
Curricular Hindsight Reinforcement Learning (CHRL) that learns an end-to-end
tracking controller that achieves powerful agility and adaptation for the
legged robot. The two key components are (I) a novel automatic curriculum
strategy on task difficulty and (ii) a Hindsight Experience Replay strategy
adapted to legged locomotion tasks. We demonstrated successful agile and
adaptive locomotion on a real quadruped robot that performed fall recovery
autonomously, coherent trotting, sustained outdoor speeds up to 3.45 m/s, and
tuning speeds up to 3.2 rad/s. This system produces adaptive behaviours
responding to changing situations and unexpected disturbances on natural
terrains like grass and dirt
- …