We present a target-driven navigation system to improve mapless visual
navigation in indoor scenes. Our method takes a multi-view observation of a
robot and a target as inputs at each time step to provide a sequence of actions
that move the robot to the target without relying on odometry or GPS at
runtime. The system is learned by optimizing a combinational objective
encompassing three key designs. First, we propose that an agent conceives the
next observation before making an action decision. This is achieved by learning
a variational generative module from expert demonstrations. We then propose
predicting static collision in advance, as an auxiliary task to improve safety
during navigation. Moreover, to alleviate the training data imbalance problem
of termination action prediction, we also introduce a target checking module to
differentiate from augmenting navigation policy with a termination action. The
three proposed designs all contribute to the improved training data efficiency,
static collision avoidance, and navigation generalization performance,
resulting in a novel target-driven mapless navigation system. Through
experiments on a TurtleBot, we provide evidence that our model can be
integrated into a robotic system and navigate in the real world. Videos and
models can be found in the supplementary material.Comment: 11 pages, accepted by IEEE Robotics and Automation Letter