1 research outputs found
Learning Geocentric Object Pose in Oblique Monocular Images
An object's geocentric pose, defined as the height above ground and
orientation with respect to gravity, is a powerful representation of real-world
structure for object detection, segmentation, and localization tasks using RGBD
images. For close-range vision tasks, height and orientation have been derived
directly from stereo-computed depth and more recently from monocular depth
predicted by deep networks. For long-range vision tasks such as Earth
observation, depth cannot be reliably estimated with monocular images. Inspired
by recent work in monocular height above ground prediction and optical flow
prediction from static images, we develop an encoding of geocentric pose to
address this challenge and train a deep network to compute the representation
densely, supervised by publicly available airborne lidar. We exploit these
attributes to rectify oblique images and remove observed object parallax to
dramatically improve the accuracy of localization and to enable accurate
alignment of multiple images taken from very different oblique viewpoints. We
demonstrate the value of our approach by extending two large-scale public
datasets for semantic segmentation in oblique satellite images. All of our data
and code are publicly available.Comment: CVPR 202