5,930 research outputs found
Estimation of signal distortion using effective sampling density for light field-based free viewpoint video
In a light field-based free viewpoint video (LF-based FVV) system, effective sampling density (ESD) is defined as the number of rays per unit area of the scene that has been acquired and is selected in the rendering process for reconstructing an unknown ray. This paper extends the concept of ESD and shows that ESD is a tractable metric that quantifies the joint impact of the imperfections of LF acquisition and rendering. By deriving and analyzing ESD for the commonly used LF acquisition and rendering methods, it is shown that ESD is an effective indicator determined by system parameters and can be used to directly estimate output video distortion without access to the ground truth. This claim is verified by extensive numerical simulations and comparison to PSNR. Furthermore, an empirical relationship between the output distortion (in PSNR) and the calculated ESD is established to allow direct assessment of the overall video distortion without an actual implementation of the system. A small scale subjective user study is also conducted which indicates a correlation of 0.91 between ESD and perceived quality
Neural 3D Video Synthesis
We propose a novel approach for 3D video synthesis that is able to represent
multi-view video recordings of a dynamic real-world scene in a compact, yet
expressive representation that enables high-quality view synthesis and motion
interpolation. Our approach takes the high quality and compactness of static
neural radiance fields in a new direction: to a model-free, dynamic setting. At
the core of our approach is a novel time-conditioned neural radiance fields
that represents scene dynamics using a set of compact latent codes. To exploit
the fact that changes between adjacent frames of a video are typically small
and locally consistent, we propose two novel strategies for efficient training
of our neural network: 1) An efficient hierarchical training scheme, and 2) an
importance sampling strategy that selects the next rays for training based on
the temporal variation of the input videos. In combination, these two
strategies significantly boost the training speed, lead to fast convergence of
the training process, and enable high quality results. Our learned
representation is highly compact and able to represent a 10 second 30 FPS
multi-view video recording by 18 cameras with a model size of just 28MB. We
demonstrate that our method can render high-fidelity wide-angle novel views at
over 1K resolution, even for highly complex and dynamic scenes. We perform an
extensive qualitative and quantitative evaluation that shows that our approach
outperforms the current state of the art. We include additional video and
information at: https://neural-3d-video.github.io/Comment: Project website: https://neural-3d-video.github.io
Neural Actor: Neural Free-view Synthesis of Human Actors with Pose Control
We propose Neural Actor (NA), a new method for high-quality synthesis of humans from arbitrary viewpoints and under arbitrary controllable poses. Our method is built upon recent neural scene representation and rendering works which learn representations of geometry and appearance from only 2D images. While existing works demonstrated compelling rendering of static scenes and playback of dynamic scenes, photo-realistic reconstruction and rendering of humans with neural implicit methods, in particular under user-controlled novel poses, is still difficult. To address this problem, we utilize a coarse body model as the proxy to unwarp the surrounding 3D space into a canonical pose. A neural radiance field learns pose-dependent geometric deformations and pose- and view-dependent appearance effects in the canonical space from multi-view video input. To synthesize novel views of high fidelity dynamic geometry and appearance, we leverage 2D texture maps defined on the body model as latent variables for predicting residual deformations and the dynamic appearance. Experiments demonstrate that our method achieves better quality than the state-of-the-arts on playback as well as novel pose synthesis, and can even generalize well to new poses that starkly differ from the training poses. Furthermore, our method also supports body shape control of the synthesized results
3D-TV Production from Conventional Cameras for Sports Broadcast
3DTV production of live sports events presents a challenging problem involving conflicting requirements of main- taining broadcast stereo picture quality with practical problems in developing robust systems for cost effective deployment. In this paper we propose an alternative approach to stereo production in sports events using the conventional monocular broadcast cameras for 3D reconstruction of the event and subsequent stereo rendering. This approach has the potential advantage over stereo camera rigs of recovering full scene depth, allowing inter-ocular distance and convergence to be adapted according to the requirements of the target display and enabling stereo coverage from both existing and ‘virtual’ camera positions without additional cameras. A prototype system is presented with results of sports TV production trials for rendering of stereo and free-viewpoint video sequences of soccer and rugby
- …