Due to the trending need of building autonomous robotic perception system,
sensor fusion has attracted a lot of attention amongst researchers and
engineers to make best use of cross-modality information. However, in order to
build a robotic platform at scale we need to emphasize on autonomous robot
platform bring-up cost as well. Cameras and radars, which inherently includes
complementary perception information, has potential for developing autonomous
robotic platform at scale. However, there is a limited work around radar fused
with Vision, compared to LiDAR fused with vision work. In this paper, we tackle
this gap with a survey on Vision-Radar fusion approaches for a BEV object
detection system. First we go through the background information viz., object
detection tasks, choice of sensors, sensor setup, benchmark datasets and
evaluation metrics for a robotic perception system. Later, we cover
per-modality (Camera and RADAR) data representation, then we go into detail
about sensor fusion techniques based on sub-groups viz., early-fusion,
deep-fusion, and late-fusion to easily understand the pros and cons of each
method. Finally, we propose possible future trends for vision-radar fusion to
enlighten future research. Regularly updated summary can be found at:
https://github.com/ApoorvRoboticist/Vision-RADAR-Fusion-BEV-SurveyComment: 6 pages, 6 figures, 2 table