40 research outputs found
LocPoseNet: Robust Location Prior for Unseen Object Pose Estimation
Object location priors have been shown to be critical for the standard 6D
object pose estimation setting, where the training and testing objects are the
same. Specifically, they can be used to initialize the 3D object translation
and facilitate 3D object rotation estimation. Unfortunately, the object
detectors that are used for this purpose do not generalize to unseen objects,
i.e., objects from new categories at test time. Therefore, existing 6D pose
estimation methods for previously-unseen objects either assume the ground-truth
object location to be known, or yield inaccurate results when it is
unavailable. In this paper, we address this problem by developing a method,
LocPoseNet, able to robustly learn location prior for unseen objects. Our
method builds upon a template matching strategy, where we propose to distribute
the reference kernels and convolve them with a query to efficiently compute
multi-scale correlations. We then introduce a novel translation estimator,
which decouples scale-aware and scale-robust features to predict different
object location parameters. Our method outperforms existing works by a large
margin on LINEMOD and GenMOP. We further construct a challenging synthetic
dataset, which allows us to highlight the better robustness of our method to
various noise sources
Linear-Covariance Loss for End-to-End Learning of 6D Pose Estimation
Most modern image-based 6D object pose estimation methods learn to predict
2D-3D correspondences, from which the pose can be obtained using a PnP solver.
Because of the non-differentiable nature of common PnP solvers, these methods
are supervised via the individual correspondences. To address this, several
methods have designed differentiable PnP strategies, thus imposing supervision
on the pose obtained after the PnP step. Here, we argue that this conflicts
with the averaging nature of the PnP problem, leading to gradients that may
encourage the network to degrade the accuracy of individual correspondences. To
address this, we derive a loss function that exploits the ground truth pose
before solving the PnP problem. Specifically, we linearize the PnP solver
around the ground-truth pose and compute the covariance of the resulting pose
distribution. We then define our loss based on the diagonal covariance
elements, which entails considering the final pose estimate yet not suffering
from the PnP averaging issue. Our experiments show that our loss consistently
improves the pose estimation accuracy for both dense and sparse correspondence
based methods, achieving state-of-the-art results on both Linemod-Occluded and
YCB-Video
Shape-Constraint Recurrent Flow for 6D Object Pose Estimation
Most recent 6D object pose methods use 2D optical flow to refine their
results. However, the general optical flow methods typically do not consider
the target's 3D shape information during matching, making them less effective
in 6D object pose estimation. In this work, we propose a shape-constraint
recurrent matching framework for 6D object pose estimation. We first compute a
pose-induced flow based on the displacement of 2D reprojection between the
initial pose and the currently estimated pose, which embeds the target's 3D
shape implicitly. Then we use this pose-induced flow to construct the
correlation map for the following matching iterations, which reduces the
matching space significantly and is much easier to learn. Furthermore, we use
networks to learn the object pose based on the current estimated flow, which
facilitates the computation of the pose-induced flow for the next iteration and
yields an end-to-end system for object pose. Finally, we optimize the optical
flow and object pose simultaneously in a recurrent manner. We evaluate our
method on three challenging 6D object pose datasets and show that it
outperforms the state of the art significantly in both accuracy and efficiency.Comment: CVPR 202
Knowledge Distillation for 6D Pose Estimation by Aligning Distributions of Local Predictions
Knowledge distillation facilitates the training of a compact student network
by using a deep teacher one. While this has achieved great success in many
tasks, it remains completely unstudied for image-based 6D object pose
estimation. In this work, we introduce the first knowledge distillation method
driven by the 6D pose estimation task. To this end, we observe that most modern
6D pose estimation frameworks output local predictions, such as sparse 2D
keypoints or dense representations, and that the compact student network
typically struggles to predict such local quantities precisely. Therefore,
instead of imposing prediction-to-prediction supervision from the teacher to
the student, we propose to distill the teacher's \emph{distribution} of local
predictions into the student network, facilitating its training. Our
experiments on several benchmarks show that our distillation method yields
state-of-the-art results with different compact student models and for both
keypoint-based and dense prediction-based architectures
Modular Quantization-Aware Training: Increasing Accuracy by Decreasing Precision in 6D Object Pose Estimation
Edge applications, such as collaborative robotics and spacecraft rendezvous,
demand efficient 6D object pose estimation on resource-constrained embedded
platforms. Existing 6D pose estimation networks are often too large for such
deployments, necessitating compression while maintaining reliable performance.
To address this challenge, we introduce Modular Quantization-Aware Training
(MQAT), an adaptive and mixed-precision quantization-aware training strategy
that exploits the modular structure of modern 6D pose estimation architectures.
MQAT guides a systematic gradated modular quantization sequence and determines
module-specific bit precisions, leading to quantized models that outperform
those produced by state-of-the-art uniform and mixed-precision quantization
techniques. Our experiments showcase the generality of MQAT across datasets,
architectures, and quantization algorithms. Remarkably, MQAT-trained quantized
models achieve a significant accuracy boost (>7%) over the baseline
full-precision network while reducing model size by a factor of 4x or more
Pseudo Flow Consistency for Self-Supervised 6D Object Pose Estimation
Most self-supervised 6D object pose estimation methods can only work with
additional depth information or rely on the accurate annotation of 2D
segmentation masks, limiting their application range. In this paper, we propose
a 6D object pose estimation method that can be trained with pure RGB images
without any auxiliary information. We first obtain a rough pose initialization
from networks trained on synthetic images rendered from the target's 3D mesh.
Then, we introduce a refinement strategy leveraging the geometry constraint in
synthetic-to-real image pairs from multiple different views. We formulate this
geometry constraint as pixel-level flow consistency between the training images
with dynamically generated pseudo labels. We evaluate our method on three
challenging datasets and demonstrate that it outperforms state-of-the-art
self-supervised methods significantly, with neither 2D annotations nor
additional depth images.Comment: Accepted by ICCV 202