Recently, Neural Radiance Field (NeRF) has shown great success in rendering
novel-view images of a given scene by learning an implicit representation with
only posed RGB images. NeRF and relevant neural field methods (e.g., neural
surface representation) typically optimize a point-wise loss and make
point-wise predictions, where one data point corresponds to one pixel.
Unfortunately, this line of research failed to use the collective supervision
of distant pixels, although it is known that pixels in an image or scene can
provide rich structural information. To the best of our knowledge, we are the
first to design a nonlocal multiplex training paradigm for NeRF and relevant
neural field methods via a novel Stochastic Structural SIMilarity (S3IM) loss
that processes multiple data points as a whole set instead of process multiple
inputs independently. Our extensive experiments demonstrate the unreasonable
effectiveness of S3IM in improving NeRF and neural surface representation for
nearly free. The improvements of quality metrics can be particularly
significant for those relatively difficult tasks: e.g., the test MSE loss
unexpectedly drops by more than 90% for TensoRF and DVGO over eight novel view
synthesis tasks; a 198% F-score gain and a 64% Chamfer L1​ distance
reduction for NeuS over eight surface reconstruction tasks. Moreover, S3IM is
consistently robust even with sparse inputs, corrupted images, and dynamic
scenes.Comment: ICCV 2023 main conference. Code: https://github.com/Madaoer/S3IM. 14
pages, 5 figures, 17 table