State-of-the-art computer vision algorithms often achieve efficiency by
making discrete choices about which hypotheses to explore next. This allows
allocation of computational resources to promising candidates, however, such
decisions are non-differentiable. As a result, these algorithms are hard to
train in an end-to-end fashion. In this work we propose to learn an efficient
algorithm for the task of 6D object pose estimation. Our system optimizes the
parameters of an existing state-of-the art pose estimation system using
reinforcement learning, where the pose estimation system now becomes the
stochastic policy, parametrized by a CNN. Additionally, we present an efficient
training algorithm that dramatically reduces computation time. We show
empirically that our learned pose estimation procedure makes better use of
limited resources and improves upon the state-of-the-art on a challenging
dataset. Our approach enables differentiable end-to-end training of complex
algorithmic pipelines and learns to make optimal use of a given computational
budget