We present a method to infer 3D pose and shape of vehicles from a single
image. To tackle this ill-posed problem, we optimize two-scale projection
consistency between the generated 3D hypotheses and their 2D
pseudo-measurements. Specifically, we use a morphable wireframe model to
generate a fine-scaled representation of vehicle shape and pose. To reduce its
sensitivity to 2D landmarks, we jointly model the 3D bounding box as a coarse
representation which improves robustness. We also integrate three task priors,
including unsupervised monocular depth, a ground plane constraint as well as
vehicle shape priors, with forward projection errors into an overall energy
function.Comment: Proc. of the AAAI, September 201