28,747 research outputs found
Learning Deployable Navigation Policies at Kilometer Scale from a Single Traversal
Model-free reinforcement learning has recently been shown to be effective at
learning navigation policies from complex image input. However, these
algorithms tend to require large amounts of interaction with the environment,
which can be prohibitively costly to obtain on robots in the real world. We
present an approach for efficiently learning goal-directed navigation policies
on a mobile robot, from only a single coverage traversal of recorded data. The
navigation agent learns an effective policy over a diverse action space in a
large heterogeneous environment consisting of more than 2km of travel, through
buildings and outdoor regions that collectively exhibit large variations in
visual appearance, self-similarity, and connectivity. We compare pretrained
visual encoders that enable precomputation of visual embeddings to achieve a
throughput of tens of thousands of transitions per second at training time on a
commodity desktop computer, allowing agents to learn from millions of
trajectories of experience in a matter of hours. We propose multiple forms of
computationally efficient stochastic augmentation to enable the learned policy
to generalise beyond these precomputed embeddings, and demonstrate successful
deployment of the learned policy on the real robot without fine tuning, despite
environmental appearance differences at test time. The dataset and code
required to reproduce these results and apply the technique to other datasets
and robots is made publicly available at rl-navigation.github.io/deployable
Beauty and the Beast: Optimal Methods Meet Learning for Drone Racing
Autonomous micro aerial vehicles still struggle with fast and agile
maneuvers, dynamic environments, imperfect sensing, and state estimation drift.
Autonomous drone racing brings these challenges to the fore. Human pilots can
fly a previously unseen track after a handful of practice runs. In contrast,
state-of-the-art autonomous navigation algorithms require either a precise
metric map of the environment or a large amount of training data collected in
the track of interest. To bridge this gap, we propose an approach that can fly
a new track in a previously unseen environment without a precise map or
expensive data collection. Our approach represents the global track layout with
coarse gate locations, which can be easily estimated from a single
demonstration flight. At test time, a convolutional network predicts the poses
of the closest gates along with their uncertainty. These predictions are
incorporated by an extended Kalman filter to maintain optimal
maximum-a-posteriori estimates of gate locations. This allows the framework to
cope with misleading high-variance estimates that could stem from poor
observability or lack of visible gates. Given the estimated gate poses, we use
model predictive control to quickly and accurately navigate through the track.
We conduct extensive experiments in the physical world, demonstrating agile and
robust flight through complex and diverse previously-unseen race tracks. The
presented approach was used to win the IROS 2018 Autonomous Drone Race
Competition, outracing the second-placing team by a factor of two.Comment: 6 pages (+1 references
Danger-aware Adaptive Composition of DRL Agents for Self-navigation
Self-navigation, referred as the capability of automatically reaching the
goal while avoiding collisions with obstacles, is a fundamental skill required
for mobile robots. Recently, deep reinforcement learning (DRL) has shown great
potential in the development of robot navigation algorithms. However, it is
still difficult to train the robot to learn goal-reaching and
obstacle-avoidance skills simultaneously. On the other hand, although many
DRL-based obstacle-avoidance algorithms are proposed, few of them are reused
for more complex navigation tasks. In this paper, a novel danger-aware adaptive
composition (DAAC) framework is proposed to combine two individually
DRL-trained agents, obstacle-avoidance and goal-reaching, to construct a
navigation agent without any redesigning and retraining. The key to this
adaptive composition approach is that the value function outputted by the
obstacle-avoidance agent serves as an indicator for evaluating the risk level
of the current situation, which in turn determines the contribution of these
two agents for the next move. Simulation and real-world testing results show
that the composed Navigation network can control the robot to accomplish
difficult navigation tasks, e.g., reaching a series of successive goals in an
unknown and complex environment safely and quickly.Comment: 7 pages, 9 figure
- …