Object pose estimation is a critical task in robotics for precise object
manipulation. However, current techniques heavily rely on a reference 3D
object, limiting their generalizability and making it expensive to expand to
new object categories. Direct pose predictions also provide limited information
for robotic grasping without referencing the 3D model. Keypoint-based methods
offer intrinsic descriptiveness without relying on an exact 3D model, but they
may lack consistency and accuracy. To address these challenges, this paper
proposes ShapeShift, a superquadric-based framework for object pose estimation
that predicts the object's pose relative to a primitive shape which is fitted
to the object. The proposed framework offers intrinsic descriptiveness and the
ability to generalize to arbitrary geometric shapes beyond the training set