We consider the problem of navigating a mobile robot towards a target in an
unknown environment that is endowed with visual sensors, where neither the
robot nor the sensors have access to global positioning information and only
use first-person-view images. In order to overcome the need for positioning, we
train the sensors to encode and communicate relevant viewpoint information to
the mobile robot, whose objective it is to use this information to navigate as
efficiently as possible to the target. We overcome the challenge of enabling
all the sensors (even those that cannot directly see the target) to predict the
direction along the shortest path to the target by implementing a
neighborhood-based feature aggregation module using a Graph Neural Network
(GNN) architecture. In our experiments, we first demonstrate generalizability
to previously unseen environments with various sensor layouts. Our results show
that by using communication between the sensors and the robot, we achieve up to
2.0x improvement in SPL (Success weighted by Path Length) when compared to a
communication-free baseline. This is done without requiring a global map,
positioning data, nor pre-calibration of the sensor network. Second, we perform
a zero-shot transfer of our model from simulation to the real world. Laboratory
experiments demonstrate the feasibility of our approach in various cluttered
environments. Finally, we showcase examples of successful navigation to the
target while the sensor network layout is dynamically reconfigured.Comment: Reformatting for IROS with updated result