This paper introduces a novel representation of volumetric videos for
real-time view synthesis of dynamic scenes. Recent advances in neural scene
representations demonstrate their remarkable capability to model and render
complex static scenes, but extending them to represent dynamic scenes is not
straightforward due to their slow rendering speed or high storage cost. To
solve this problem, our key idea is to represent the radiance field of each
frame as a set of shallow MLP networks whose parameters are stored in 2D grids,
called MLP maps, and dynamically predicted by a 2D CNN decoder shared by all
frames. Representing 3D scenes with shallow MLPs significantly improves the
rendering speed, while dynamically predicting MLP parameters with a shared 2D
CNN instead of explicitly storing them leads to low storage cost. Experiments
show that the proposed approach achieves state-of-the-art rendering quality on
the NHR and ZJU-MoCap datasets, while being efficient for real-time rendering
with a speed of 41.7 fps for 512×512 images on an RTX 3090 GPU. The
code is available at https://zju3dv.github.io/mlp_maps/.Comment: Accepted to CVPR 2023. The first two authors contributed equally to
this paper. Project page: https://zju3dv.github.io/mlp_maps