33 research outputs found
Reinforcement Learning and Planning for Preference Balancing Tasks
Robots are often highly non-linear dynamical systems with many degrees of freedom, making solving motion problems computationally challenging. One solution has been reinforcement learning (RL), which learns through experimentation to automatically perform the near-optimal motions that complete a task. However, high-dimensional problems and task formulation often prove challenging for RL. We address these problems with PrEference Appraisal Reinforcement Learning (PEARL), which solves Preference Balancing Tasks (PBTs). PBTs define a problem as a set of preferences that the system must balance to achieve a goal. The method is appropriate for acceleration-controlled systems with continuous state-space and either discrete or continuous action spaces with unknown system dynamics. We show that PEARL learns a sub-optimal policy on a subset of states and actions, and transfers the policy to the expanded domain to produce a more refined plan on a class of robotic problems. We establish convergence to task goal conditions, and even when preconditions are not verifiable, show that this is a valuable method to use before other more expensive approaches. Evaluation is done on several robotic problems, such as Aerial Cargo Delivery, Multi-Agent Pursuit, Rendezvous, and Inverted Flying Pendulum both in simulation and experimentally. Additionally, PEARL is leveraged outside of robotics as an array sorting agent. The results demonstrate high accuracy and fast learning times on a large set of practical applications
PRM-RL: Long-range Robotic Navigation Tasks by Combining Reinforcement Learning and Sampling-based Planning
We present PRM-RL, a hierarchical method for long-range navigation task
completion that combines sampling based path planning with reinforcement
learning (RL). The RL agents learn short-range, point-to-point navigation
policies that capture robot dynamics and task constraints without knowledge of
the large-scale topology. Next, the sampling-based planners provide roadmaps
which connect robot configurations that can be successfully navigated by the RL
agent. The same RL agents are used to control the robot under the direction of
the planning, enabling long-range navigation. We use the Probabilistic Roadmaps
(PRMs) for the sampling-based planner. The RL agents are constructed using
feature-based and deep neural net policies in continuous state and action
spaces. We evaluate PRM-RL, both in simulation and on-robot, on two navigation
tasks with non-trivial robot dynamics: end-to-end differential drive indoor
navigation in office environments, and aerial cargo delivery in urban
environments with load displacement constraints. Our results show improvement
in task completion over both RL agents on their own and traditional
sampling-based planners. In the indoor navigation task, PRM-RL successfully
completes up to 215 m long trajectories under noisy sensor conditions, and the
aerial cargo delivery completes flights over 1000 m without violating the task
constraints in an environment 63 million times larger than used in training.Comment: 9 pages, 7 figure
Air Learning: An AI Research Platform for Algorithm-Hardware Benchmarking of Autonomous Aerial Robots
We introduce Air Learning, an open-source simulator, and a gym environment
for deep reinforcement learning research on resource-constrained aerial robots.
Equipped with domain randomization, Air Learning exposes a UAV agent to a
diverse set of challenging scenarios. We seed the toolset with point-to-point
obstacle avoidance tasks in three different environments and Deep Q Networks
(DQN) and Proximal Policy Optimization (PPO) trainers. Air Learning assesses
the policies' performance under various quality-of-flight (QoF) metrics, such
as the energy consumed, endurance, and the average trajectory length, on
resource-constrained embedded platforms like a Raspberry Pi. We find that the
trajectories on an embedded Ras-Pi are vastly different from those predicted on
a high-end desktop system, resulting in up to longer trajectories in one
of the environments. To understand the source of such discrepancies, we use Air
Learning to artificially degrade high-end desktop performance to mimic what
happens on a low-end embedded system. We then propose a mitigation technique
that uses the hardware-in-the-loop to determine the latency distribution of
running the policy on the target platform (onboard compute on aerial robot). A
randomly sampled latency from the latency distribution is then added as an
artificial delay within the training loop. Training the policy with artificial
delays allows us to minimize the hardware gap (discrepancy in the flight time
metric reduced from 37.73\% to 0.5\%). Thus, Air Learning with
hardware-in-the-loop characterizes those differences and exposes how the
onboard compute's choice affects the aerial robot's performance. We also
conduct reliability studies to assess the effect of sensor failures on the
learned policies. All put together, \airl enables a broad class of deep RL
research on UAVs. The source code is available
at:~\texttt{\url{http://bit.ly/2JNAVb6}}.Comment: To Appear in Springer Machine Learning Journal (Special Issue on
Reinforcement Learning for Real Life