2 research outputs found
Robust 3D-aware Object Classification via Discriminative Render-and-Compare
In real-world applications, it is essential to jointly estimate the 3D object
pose and class label of objects, i.e., to perform 3D-aware classification.While
current approaches for either image classification or pose estimation can be
extended to 3D-aware classification, we observe that they are inherently
limited: 1) Their performance is much lower compared to the respective
single-task models, and 2) they are not robust in out-of-distribution (OOD)
scenarios. Our main contribution is a novel architecture for 3D-aware
classification, which builds upon a recent work and performs comparably to
single-task models while being highly robust. In our method, an object category
is represented as a 3D cuboid mesh composed of feature vectors at each mesh
vertex. Using differentiable rendering, we estimate the 3D object pose by
minimizing the reconstruction error between the mesh and the feature
representation of the target image. Object classification is then performed by
comparing the reconstruction losses across object categories. Notably, the
neural texture of the mesh is trained in a discriminative manner to enhance the
classification performance while also avoiding local optima in the
reconstruction loss. Furthermore, we show how our method and feed-forward
neural networks can be combined to scale the render-and-compare approach to
larger numbers of categories. Our experiments on PASCAL3D+, occluded-PASCAL3D+,
and OOD-CV show that our method outperforms all baselines at 3D-aware
classification by a wide margin in terms of performance and robustness
OOD-CV-v2: An extended Benchmark for Robustness to Out-of-Distribution Shifts of Individual Nuisances in Natural Images
Enhancing the robustness of vision algorithms in real-world scenarios is
challenging. One reason is that existing robustness benchmarks are limited, as
they either rely on synthetic data or ignore the effects of individual nuisance
factors. We introduce OOD-CV-v2, a benchmark dataset that includes
out-of-distribution examples of 10 object categories in terms of pose, shape,
texture, context and the weather conditions, and enables benchmarking of models
for image classification, object detection, and 3D pose estimation. In addition
to this novel dataset, we contribute extensive experiments using popular
baseline methods, which reveal that: 1) Some nuisance factors have a much
stronger negative effect on the performance compared to others, also depending
on the vision task. 2) Current approaches to enhance robustness have only
marginal effects, and can even reduce robustness. 3) We do not observe
significant differences between convolutional and transformer architectures. We
believe our dataset provides a rich test bed to study robustness and will help
push forward research in this area.
Our dataset can be accessed from https://bzhao.me/OOD-CV/Comment: arXiv admin note: substantial text overlap with arXiv:2111.1434