1 research outputs found
I Like to Move It: 6D Pose Estimation as an Action Decision Process
Object pose estimation is an integral part of robot vision and AR. Previous
6D pose retrieval pipelines treat the problem either as a regression task or
discretize the pose space to classify. We change this paradigm and reformulate
the problem as an action decision process where an initial pose is updated in
incremental discrete steps that sequentially move a virtual 3D rendering
towards the correct solution. A neural network estimates likely moves from a
single RGB image iteratively and determines so an acceptable final pose. In
comparison to other approaches that train object-specific pose models, we learn
a decision process. This allows for a lightweight architecture while it
naturally generalizes to unseen objects. A coherent stop action for process
termination enables dynamic reduction of the computation cost if there are
insignificant changes in a video sequence. Instead of a static inference time,
we thereby automatically increase the runtime depending on the object motion.
Robustness and accuracy of our action decision network are evaluated on Laval
and YCB video scenes where we significantly improve the state-of-the-art