Generalizable NeRF can directly synthesize novel views across new scenes,
eliminating the need for scene-specific retraining in vanilla NeRF. A critical
enabling factor in these approaches is the extraction of a generalizable 3D
representation by aggregating source-view features. In this paper, we propose
an Entangled View-Epipolar Information Aggregation method dubbed EVE-NeRF.
Different from existing methods that consider cross-view and along-epipolar
information independently, EVE-NeRF conducts the view-epipolar feature
aggregation in an entangled manner by injecting the scene-invariant appearance
continuity and geometry consistency priors to the aggregation process. Our
approach effectively mitigates the potential lack of inherent geometric and
appearance constraint resulting from one-dimensional interactions, thus further
boosting the 3D representation generalizablity. EVE-NeRF attains
state-of-the-art performance across various evaluation scenarios. Extensive
experiments demonstate that, compared to prevailing single-dimensional
aggregation, the entangled network excels in the accuracy of 3D scene geometry
and appearance reconstruction. Our code is publicly available at
https://github.com/tatakai1/EVENeRF.Comment: Accepted by CVPR-202