2 research outputs found
Towards Generalization Across Depth for Monocular 3D Object Detection
While expensive LiDAR and stereo camera rigs have enabled the development of
successful 3D object detection methods, monocular RGB-only approaches lag much
behind. This work advances the state of the art by introducing MoVi-3D, a
novel, single-stage deep architecture for monocular 3D object detection.
MoVi-3D builds upon a novel approach which leverages geometrical information to
generate, both at training and test time, virtual views where the object
appearance is normalized with respect to distance. These virtually generated
views facilitate the detection task as they significantly reduce the visual
appearance variability associated to objects placed at different distances from
the camera. As a consequence, the deep model is relieved from learning
depth-specific representations and its complexity can be significantly reduced.
In particular, in this work we show that, thanks to our virtual views
generation process, a lightweight, single-stage architecture suffices to set
new state-of-the-art results on the popular KITTI3D benchmark