6D object pose estimation aims to infer the relative pose between the object
and the camera using a single image or multiple images. Most works have focused
on predicting the object pose without associated uncertainty under occlusion
and structural ambiguity (symmetricity). However, these works demand prior
information about shape attributes, and this condition is hardly satisfied in
reality; even asymmetric objects may be symmetric under the viewpoint change.
In addition, acquiring and fusing diverse sensor data is challenging when
extending them to robotics applications. Tackling these limitations, we present
an ambiguity-aware 6D object pose estimation network, PrimA6D++, as a generic
uncertainty prediction method. The major challenges in pose estimation, such as
occlusion and symmetry, can be handled in a generic manner based on the
measured ambiguity of the prediction. Specifically, we devise a network to
reconstruct the three rotation axis primitive images of a target object and
predict the underlying uncertainty along each primitive axis. Leveraging the
estimated uncertainty, we then optimize multi-object poses using visual
measurements and camera poses by treating it as an object SLAM problem. The
proposed method shows a significant performance improvement in T-LESS and
YCB-Video datasets. We further demonstrate real-time scene recognition
capability for visually-assisted robot manipulation. Our code and supplementary
materials are available at https://github.com/rpmsnu/PrimA6D.Comment: IEEE Robotics and Automation Letter