See What the Robot Can't See: Learning Cooperative Perception for Visual Navigation

Abstract

We consider the problem of navigating a mobile robot towards a target in an unknown environment that is endowed with visual sensors, where neither the robot nor the sensors have access to global positioning information and only use first-person-view images. In order to overcome the need for positioning, we train the sensors to encode and communicate relevant viewpoint information to the mobile robot, whose objective it is to use this information to navigate as efficiently as possible to the target. We overcome the challenge of enabling all the sensors (even those that cannot directly see the target) to predict the direction along the shortest path to the target by implementing a neighborhood-based feature aggregation module using a Graph Neural Network (GNN) architecture. In our experiments, we first demonstrate generalizability to previously unseen environments with various sensor layouts. Our results show that by using communication between the sensors and the robot, we achieve up to 2.0x improvement in SPL (Success weighted by Path Length) when compared to a communication-free baseline. This is done without requiring a global map, positioning data, nor pre-calibration of the sensor network. Second, we perform a zero-shot transfer of our model from simulation to the real world. Laboratory experiments demonstrate the feasibility of our approach in various cluttered environments. Finally, we showcase examples of successful navigation to the target while the sensor network layout is dynamically reconfigured.Comment: Reformatting for IROS with updated result

    Similar works

    Full text

    thumbnail-image

    Available Versions