Neural Radiance Fields (NeRF) recently emerged as a new paradigm for object
representation from multi-view (MV) images. Yet, it cannot handle multi-scale
(MS) images and camera pose estimation errors, which generally is the case with
multi-view images captured from a day-to-day commodity camera. Although
recently proposed Mip-NeRF could handle multi-scale imaging problems with NeRF,
it cannot handle camera pose estimation error. On the other hand, the newly
proposed BARF can solve the camera pose problem with NeRF but fails if the
images are multi-scale in nature. This paper presents a robust multi-scale
neural radiance fields representation approach to simultaneously overcome both
real-world imaging issues. Our method handles multi-scale imaging effects and
camera-pose estimation problems with NeRF-inspired approaches by leveraging the
fundamentals of scene rigidity. To reduce unpleasant aliasing artifacts due to
multi-scale images in the ray space, we leverage Mip-NeRF multi-scale
representation. For joint estimation of robust camera pose, we propose
graph-neural network-based multiple motion averaging in the neural volume
rendering framework. We demonstrate, with examples, that for an accurate neural
representation of an object from day-to-day acquired multi-view images, it is
crucial to have precise camera-pose estimates. Without considering robustness
measures in the camera pose estimation, modeling for multi-scale aliasing
artifacts via conical frustum can be counterproductive. We present extensive
experiments on the benchmark datasets to demonstrate that our approach provides
better results than the recent NeRF-inspired approaches for such realistic
settings.Comment: Accepted for publication at British Machine Vision Conference (BMVC)
2022. Draft info: 13 pages, 3 Figures, and 4 Table