Detecting objects of interest, such as human survivors, safety equipment, and
structure access points, is critical to any search-and-rescue operation. Robots
deployed for such time-sensitive efforts rely on their onboard sensors to
perform their designated tasks. However, as disaster response operations are
predominantly conducted under perceptually degraded conditions, commonly
utilized sensors such as visual cameras and LiDARs suffer in terms of
performance degradation. In response, this work presents a method that utilizes
the complementary nature of vision and depth sensors to leverage multi-modal
information to aid object detection at longer distances. In particular, depth
and intensity values from sparse LiDAR returns are used to generate proposals
for objects present in the environment. These proposals are then utilized by a
Pan-Tilt-Zoom (PTZ) camera system to perform a directed search by adjusting its
pose and zoom level for performing object detection and classification in
difficult environments. The proposed work has been thoroughly verified using an
ANYmal quadruped robot in underground settings and on datasets collected during
the DARPA Subterranean Challenge finals.Comment: 6 pages, 5 Figures, 2 Tables, conference: IEEE International
Symposium on Safety, Security and Rescue Robotics (SSRR-2022), Seville, Spai