Meeting online is becoming the new normal. Creating an immersive experience
for online meetings is a necessity towards more diverse and seamless
environments. Efficient photorealistic rendering of human 3D dynamics is the
core of immersive meetings. Current popular applications achieve real-time
conferencing but fall short in delivering photorealistic human dynamics, either
due to limited 2D space or the use of avatars that lack realistic interactions
between participants. Recent advances in neural rendering, such as the Neural
Radiance Field (NeRF), offer the potential for greater realism in metaverse
meetings. However, the slow rendering speed of NeRF poses challenges for
real-time conferencing. We envision a pipeline for a future extended reality
metaverse conferencing system that leverages monocular video acquisition and
free-viewpoint synthesis to enhance data and hardware efficiency. Towards an
immersive conferencing experience, we explore an accelerated NeRF-based
free-viewpoint synthesis algorithm for rendering photorealistic human dynamics
more efficiently. We show that our algorithm achieves comparable rendering
quality while performing training and inference 44.5% and 213% faster than
state-of-the-art methods, respectively. Our exploration provides a design basis
for constructing metaverse conferencing systems that can handle complex
application scenarios, including dynamic scene relighting with customized
themes and multi-user conferencing that harmonizes real-world people into an
extended world.Comment: Accepted to CVPR 2023 ECV Worksho