Pose-free neural radiance fields (NeRF) aim to train NeRF with unposed
multi-view images and it has achieved very impressive success in recent years.
Most existing works share the pipeline of training a coarse pose estimator with
rendered images at first, followed by a joint optimization of estimated poses
and neural radiance field. However, as the pose estimator is trained with only
rendered images, the pose estimation is usually biased or inaccurate for real
images due to the domain gap between real images and rendered images, leading
to poor robustness for the pose estimation of real images and further local
minima in joint optimization. We design IR-NeRF, an innovative pose-free NeRF
that introduces implicit pose regularization to refine pose estimator with
unposed real images and improve the robustness of the pose estimation for real
images. With a collection of 2D images of a specific scene, IR-NeRF constructs
a scene codebook that stores scene features and captures the scene-specific
pose distribution implicitly as priors. Thus, the robustness of pose estimation
can be promoted with the scene priors according to the rationale that a 2D real
image can be well reconstructed from the scene codebook only when its estimated
pose lies within the pose distribution. Extensive experiments show that IR-NeRF
achieves superior novel view synthesis and outperforms the state-of-the-art
consistently across multiple synthetic and real datasets.Comment: Accepted by ICCV202