1,103 research outputs found
Data augmentation for NeRF: a geometric consistent solution based on view morphing
NeRF aims to learn a continuous neural scene representation by using a finite
set of input images taken from different viewpoints. The fewer the number of
viewpoints, the higher the likelihood of overfitting on them. This paper
mitigates such limitation by presenting a novel data augmentation approach to
generate geometrically consistent image transitions between viewpoints using
view morphing. View morphing is a highly versatile technique that does not
requires any prior knowledge about the 3D scene because it is based on general
principles of projective geometry. A key novelty of our method is to use the
very same depths predicted by NeRF to generate the image transitions that are
then added to NeRF training. We experimentally show that this procedure enables
NeRF to improve the quality of its synthesised novel views in the case of
datasets with few training viewpoints. We improve PSNR up to 1.8dB and 10.5dB
when eight and four views are used for training, respectively. To the best of
our knowledge, this is the first data augmentation strategy for NeRF that
explicitly synthesises additional new input images to improve the model
generalisation
No-reference depth map quality evaluation model based on depth map edge confidence measurement in immersive video applications
When it comes to evaluating perceptual quality of digital media for overall quality of
experience assessment in immersive video applications, typically two main approaches stand out:
Subjective and objective quality evaluation. On one hand, subjective quality evaluation offers the
best representation of perceived video quality assessed by the real viewers. On the other hand, it
consumes a significant amount of time and effort, due to the involvement of real users with lengthy
and laborious assessment procedures. Thus, it is essential that an objective quality evaluation model
is developed. The speed-up advantage offered by an objective quality evaluation model, which can
predict the quality of rendered virtual views based on the depth maps used in the rendering process,
allows for faster quality assessments for immersive video applications. This is particularly
important given the lack of a suitable reference or ground truth for comparing the available depth
maps, especially when live content services are offered in those applications. This paper presents a
no-reference depth map quality evaluation model based on a proposed depth map edge confidence
measurement technique to assist with accurately estimating the quality of rendered (virtual) views
in immersive multi-view video content. The model is applied for depth image-based rendering in
multi-view video format, providing comparable evaluation results to those existing in the literature,
and often exceeding their performance
Recommended from our members
A Novel Inpainting Framework for Virtual View Synthesis
Multi-view imaging has stimulated significant research to enhance the user experience of free viewpoint video, allowing interactive navigation between views and the freedom to select a desired view to watch. This usually involves transmitting both textural and depth information captured from different viewpoints to the receiver, to enable the synthesis of an arbitrary view. In rendering these virtual views, perceptual holes can appear due to certain regions, hidden in the original view by a closer object, becoming visible in the virtual view. To provide a high quality experience these holes must be filled in a visually plausible way, in a process known as inpainting. This is challenging because the missing information is generally unknown and the hole-regions can be large. Recently depth-based inpainting techniques have been proposed to address this challenge and while these generally perform better than non-depth assisted methods, they are not very robust and can produce perceptual artefacts.
This thesis presents a new inpainting framework that innovatively exploits depth and textural self-similarity characteristics to construct subjectively enhanced virtual viewpoints. The framework makes three significant contributions to the field: i) the exploitation of view information to jointly inpaint textural and depth hole regions; ii) the introduction of the novel concept of self-similarity characterisation which is combined with relevant depth information; and iii) an advanced self-similarity characterising scheme that automatically determines key spatial transform parameters for effective and flexible inpainting.
The presented inpainting framework has been critically analysed and shown to provide superior performance both perceptually and numerically compared to existing techniques, especially in terms of lower visual artefacts. It provides a flexible robust framework to develop new inpainting strategies for the next generation of interactive multi-view technologies
NoPe-NeRF: Optimising Neural Radiance Field with No Pose Prior
Training a Neural Radiance Field (NeRF) without pre-computed camera poses is
challenging. Recent advances in this direction demonstrate the possibility of
jointly optimising a NeRF and camera poses in forward-facing scenes. However,
these methods still face difficulties during dramatic camera movement. We
tackle this challenging problem by incorporating undistorted monocular depth
priors. These priors are generated by correcting scale and shift parameters
during training, with which we are then able to constrain the relative poses
between consecutive frames. This constraint is achieved using our proposed
novel loss functions. Experiments on real-world indoor and outdoor scenes show
that our method can handle challenging camera trajectories and outperforms
existing methods in terms of novel view rendering quality and pose estimation
accuracy. Our project page is https://nope-nerf.active.vision
{D-NeRF}: {N}eural Radiance Fields for Dynamic Scenes
Trabajo presentado en la IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), celebrada de forma virtual desde Nashville, TN (Estados Unidos), del 20 al 25 de junio de 2021Neural rendering techniques combining machine learning with geometric reasoning have arisen as one of the most promising approaches for synthesizing novel views of a scene from a sparse set of images. Among these, stands out the Neural radiance fields (NeRF), which trains a deep network to map 5D input coordinates (representing spatial location and viewing direction) into a volume density and view-dependent emitted radiance. However, despite achieving an unprecedented level of photorealism on the generated images, NeRF is only applicable to static scenes, where the same spatial location can be queried from different images. In this paper we introduce D-NeRF, a method that extends neural radiance fields to a dynamic domain, allowing to reconstruct and render novel images of objects under rigid and non-rigid motions. For this purpose we consider time as an additional input to the system, and split the learning process in two main stages: one that encodes the scene into a canonical space and another that maps this canonical representation into the deformed scene at a particular time. Both mappings are learned using fully-connected networks. Once the networks are trained, D-NeRF can render novel images, controlling both the camera view and the time variable, and thus, the object movement. We demonstrate the effectiveness of our approach on scenes with objects under rigid, articulated and non-rigid motions.Peer reviewe
Volumetric performance capture from minimal camera viewpoints
We present a convolutional autoencoder that enables high fidelity volumetric
reconstructions of human performance to be captured from multi-view video
comprising only a small set of camera views. Our method yields similar
end-to-end reconstruction error to that of a probabilistic visual hull computed
using significantly more (double or more) viewpoints. We use a deep prior
implicitly learned by the autoencoder trained over a dataset of view-ablated
multi-view video footage of a wide range of subjects and actions. This opens up
the possibility of high-end volumetric performance capture in on-set and
prosumer scenarios where time or cost prohibit a high witness camera count
- …