53 research outputs found
iNeRF: Inverting Neural Radiance Fields for Pose Estimation
We present iNeRF, a framework that performs mesh-free pose estimation by
"inverting" a Neural RadianceField (NeRF). NeRFs have been shown to be
remarkably effective for the task of view synthesis - synthesizing
photorealistic novel views of real-world scenes or objects. In this work, we
investigate whether we can apply analysis-by-synthesis via NeRF for mesh-free,
RGB-only 6DoF pose estimation - given an image, find the translation and
rotation of a camera relative to a 3D object or scene. Our method assumes that
no object mesh models are available during either training or test time.
Starting from an initial pose estimate, we use gradient descent to minimize the
residual between pixels rendered from a NeRF and pixels in an observed image.
In our experiments, we first study 1) how to sample rays during pose refinement
for iNeRF to collect informative gradients and 2) how different batch sizes of
rays affect iNeRF on a synthetic dataset. We then show that for complex
real-world scenes from the LLFF dataset, iNeRF can improve NeRF by estimating
the camera poses of novel images and using these images as additional training
data for NeRF. Finally, we show iNeRF can perform category-level object pose
estimation, including object instances not seen during training, with RGB
images by inverting a NeRF model inferred from a single view.Comment: Website: http://yenchenlin.me/inerf
Ensemble of 6 DoF Pose estimation from state-of-the-art deep methods.
Deep learning methods have revolutionized computer vision since the appearance of AlexNet in 2012. Nevertheless, 6 degrees of freedom pose estimation is still a difficult task to perform precisely. Therefore, we propose 2 ensemble techniques to refine poses from different deep learning 6DoF pose estimation models. The first technique, merge ensemble, combines the outputs of the base models geometrically. In the second, stacked generalization, a machine learning model is trained using the outputs of the base models and outputs the refined pose. The merge method improves the performance of the base models on LMO and YCB-V datasets and performs better on the pose estimation task than the stacking strategy.This paper has been supported by the project PROFLOW under the Basque program ELKARTEK, grant agreement No. KK-2022/00024
CheckerPose: Progressive Dense Keypoint Localization for Object Pose Estimation with Graph Neural Network
Estimating the 6-DoF pose of a rigid object from a single RGB image is a
crucial yet challenging task. Recent studies have shown the great potential of
dense correspondence-based solutions, yet improvements are still needed to
reach practical deployment. In this paper, we propose a novel pose estimation
algorithm named CheckerPose, which improves on three main aspects. Firstly,
CheckerPose densely samples 3D keypoints from the surface of the 3D object and
finds their 2D correspondences progressively in the 2D image. Compared to
previous solutions that conduct dense sampling in the image space, our strategy
enables the correspondence searching in a 2D grid (i.e., pixel coordinate).
Secondly, for our 3D-to-2D correspondence, we design a compact binary code
representation for 2D image locations. This representation not only allows for
progressive correspondence refinement but also converts the correspondence
regression to a more efficient classification problem. Thirdly, we adopt a
graph neural network to explicitly model the interactions among the sampled 3D
keypoints, further boosting the reliability and accuracy of the
correspondences. Together, these novel components make our CheckerPose a strong
pose estimation algorithm. When evaluated on the popular Linemod, Linemod-O,
and YCB-V object pose estimation benchmarks, CheckerPose clearly boosts the
accuracy of correspondence-based methods and achieves state-of-the-art
performances
Pose Proposal Critic: Robust Pose Refinement by Learning Reprojection Errors
In recent years, considerable progress has been made for the task of rigid object pose estimation from a single RGB-image, but achieving robustness to partial occlusions remains a challenging problem. Pose refinement via rendering has shown promise in order to achieve improved results, in particular, when data is scarce. In this paper we focus our attention on pose refinement, and show how to push the state-of-the-art further in the case of partial occlusions. The proposed pose refinement method leverages on a simplified learning task, where a CNN is trained to estimate the reprojection error between an observed and a rendered image. We experiment by training on purely synthetic data as well as a mixture of synthetic and real data. Current state-of-the-art results are outperformed for two out of three metrics on the Occlusion LINEMOD benchmark, while performing on-par for the final metric
- …