16 research outputs found
Learning to Predict Image-based Rendering Artifacts with Respect to a Hidden Reference Image
Image metrics predict the perceived per-pixel difference between a reference
image and its degraded (e. g., re-rendered) version. In several important
applications, the reference image is not available and image metrics cannot be
applied. We devise a neural network architecture and training procedure that
allows predicting the MSE, SSIM or VGG16 image difference from the distorted
image alone while the reference is not observed. This is enabled by two
insights: The first is to inject sufficiently many un-distorted natural image
patches, which can be found in arbitrary amounts and are known to have no
perceivable difference to themselves. This avoids false positives. The second
is to balance the learning, where it is carefully made sure that all image
errors are equally likely, avoiding false negatives. Surprisingly, we observe,
that the resulting no-reference metric, subjectively, can even perform better
than the reference-based one, as it had to become robust against
mis-alignments. We evaluate the effectiveness of our approach in an image-based
rendering context, both quantitatively and qualitatively. Finally, we
demonstrate two applications which reduce light field capture time and provide
guidance for interactive depth adjustment.Comment: 13 pages, 11 figure
SILVR: A Synthetic Immersive Large-Volume Plenoptic Dataset
In six-degrees-of-freedom light-field (LF) experiences, the viewer's freedom
is limited by the extent to which the plenoptic function was sampled. Existing
LF datasets represent only small portions of the plenoptic function, such that
they either cover a small volume, or they have limited field of view.
Therefore, we propose a new LF image dataset "SILVR" that allows for
six-degrees-of-freedom navigation in much larger volumes while maintaining full
panoramic field of view. We rendered three different virtual scenes in various
configurations, where the number of views ranges from 642 to 2226. One of these
scenes (called Zen Garden) is a novel scene, and is made publicly available. We
chose to position the virtual cameras closely together in large cuboid and
spherical organisations ( to ), equipped with 180{\deg} fish-eye
lenses. Every view is rendered to a color image and depth map of 2048px
2048px. Additionally, we present the software used to automate the
multi-view rendering process, as well as a lens-reprojection tool that converts
between images with panoramic or fish-eye projection to a standard rectilinear
(i.e., perspective) projection. Finally, we demonstrate how the proposed
dataset and software can be used to evaluate LF coding/rendering techniques(in
this case for training NeRFs with instant-ngp). As such, we provide the first
publicly-available LF dataset for large volumes of light with full panoramic
field of viewComment: In 13th ACM Multimedia Systems Conference (MMSys '22), June 14-17,
2022, Athlone, Ireland. ACM, New York, NY, USA, 6 page
An Efficient Refocusing Scheme for Camera-Array Captured Light Field Video for Improved Visual Immersiveness
Light field video technology attempts to acquire human-like visual data, offering unprecedented immersiveness and a viable path for producing high-quality VR content. Refocusing that is one of the key properties of light field and a must for mixed reality applications has shown to work well for microlens based cameras, but as light field videos acquired by camera arrays have a low angular resolution, the refocused quality suffers. In this paper, we present an approach to improve the visual quality of refocused content captured by a camera array-based setup. Increasing the angular resolution using existing deep learning-based view synthesis method and refocusing the video using shift and sum refocusing algorithm produces over blurring of the in-focus region. Our enhancement method targets these blurry pixels and improves their quality by similarity detection and blending. Experimental results show that the proposed approach achieves better refocusing quality compared to traditional methods
NeRSemble: Multi-view Radiance Field Reconstruction of Human Heads
We focus on reconstructing high-fidelity radiance fields of human heads,
capturing their animations over time, and synthesizing re-renderings from novel
viewpoints at arbitrary time steps. To this end, we propose a new multi-view
capture setup composed of 16 calibrated machine vision cameras that record
time-synchronized images at 7.1 MP resolution and 73 frames per second. With
our setup, we collect a new dataset of over 4700 high-resolution,
high-framerate sequences of more than 220 human heads, from which we introduce
a new human head reconstruction benchmark. The recorded sequences cover a wide
range of facial dynamics, including head motions, natural expressions,
emotions, and spoken language. In order to reconstruct high-fidelity human
heads, we propose Dynamic Neural Radiance Fields using Hash Ensembles
(NeRSemble). We represent scene dynamics by combining a deformation field and
an ensemble of 3D multi-resolution hash encodings. The deformation field allows
for precise modeling of simple scene movements, while the ensemble of hash
encodings helps to represent complex dynamics. As a result, we obtain radiance
field representations of human heads that capture motion over time and
facilitate re-rendering of arbitrary novel viewpoints. In a series of
experiments, we explore the design choices of our method and demonstrate that
our approach outperforms state-of-the-art dynamic radiance field approaches by
a significant margin.Comment: Siggraph 2023, Project Page:
https://tobias-kirschstein.github.io/nersemble/ , Video:
https://youtu.be/a-OAWqBzld
X-Fields: Implicit Neural View-, Light- and Time-Image Interpolation
We suggest to represent an X-Field -a set of 2D images taken across different view, time or illumination conditions, i.e., video, light field, reflectance fields or combinations thereof-by learning a neural network (NN) to map their view, time or light coordinates to 2D images. Executing this NN at new coordinates results in joint view, time or light interpolation. The key idea to make this workable is a NN that already knows the "basic tricks" of graphics (lighting, 3D projection, occlusion) in a hard-coded and differentiable form. The NN represents the input to that rendering as an implicit map, that for any view, time, or light coordinate and for any pixel can quantify how it will move if view, time or light coordinates change (Jacobian of pixel position with respect to view, time, illumination, etc.). Our X-Field representation is trained for one scene within minutes, leading to a compact set of trainable parameters and hence real-time navigation in view, time and illumination
3D Scene Modeling from Dense Video Light Fields
International audienceLight field imaging offers unprecedented opportunities for advanced scene analysis and modelling, with potential applications in various domains such as augmented reality, 3D robotics, and microscopy. This paper illustrates the potential of dense video light fields for 3D scene modeling. We first recall the principles of plenoptic cameras and present a downloadable test dataset captured with a Raytrix 2.0 plenop-tic camera. Methods to estimate the scene depth and to construct a 3D point cloud representation of the scene from the captured light field are then described.