16 research outputs found

    Learning to Predict Image-based Rendering Artifacts with Respect to a Hidden Reference Image

    Full text link
    Image metrics predict the perceived per-pixel difference between a reference image and its degraded (e. g., re-rendered) version. In several important applications, the reference image is not available and image metrics cannot be applied. We devise a neural network architecture and training procedure that allows predicting the MSE, SSIM or VGG16 image difference from the distorted image alone while the reference is not observed. This is enabled by two insights: The first is to inject sufficiently many un-distorted natural image patches, which can be found in arbitrary amounts and are known to have no perceivable difference to themselves. This avoids false positives. The second is to balance the learning, where it is carefully made sure that all image errors are equally likely, avoiding false negatives. Surprisingly, we observe, that the resulting no-reference metric, subjectively, can even perform better than the reference-based one, as it had to become robust against mis-alignments. We evaluate the effectiveness of our approach in an image-based rendering context, both quantitatively and qualitatively. Finally, we demonstrate two applications which reduce light field capture time and provide guidance for interactive depth adjustment.Comment: 13 pages, 11 figure

    SILVR: A Synthetic Immersive Large-Volume Plenoptic Dataset

    Full text link
    In six-degrees-of-freedom light-field (LF) experiences, the viewer's freedom is limited by the extent to which the plenoptic function was sampled. Existing LF datasets represent only small portions of the plenoptic function, such that they either cover a small volume, or they have limited field of view. Therefore, we propose a new LF image dataset "SILVR" that allows for six-degrees-of-freedom navigation in much larger volumes while maintaining full panoramic field of view. We rendered three different virtual scenes in various configurations, where the number of views ranges from 642 to 2226. One of these scenes (called Zen Garden) is a novel scene, and is made publicly available. We chose to position the virtual cameras closely together in large cuboid and spherical organisations (2.2m32.2m^3 to 48m348m^3), equipped with 180{\deg} fish-eye lenses. Every view is rendered to a color image and depth map of 2048px ×\times 2048px. Additionally, we present the software used to automate the multi-view rendering process, as well as a lens-reprojection tool that converts between images with panoramic or fish-eye projection to a standard rectilinear (i.e., perspective) projection. Finally, we demonstrate how the proposed dataset and software can be used to evaluate LF coding/rendering techniques(in this case for training NeRFs with instant-ngp). As such, we provide the first publicly-available LF dataset for large volumes of light with full panoramic field of viewComment: In 13th ACM Multimedia Systems Conference (MMSys '22), June 14-17, 2022, Athlone, Ireland. ACM, New York, NY, USA, 6 page

    An Efficient Refocusing Scheme for Camera-Array Captured Light Field Video for Improved Visual Immersiveness

    Get PDF
    Light field video technology attempts to acquire human-like visual data, offering unprecedented immersiveness and a viable path for producing high-quality VR content. Refocusing that is one of the key properties of light field and a must for mixed reality applications has shown to work well for microlens based cameras, but as light field videos acquired by camera arrays have a low angular resolution, the refocused quality suffers. In this paper, we present an approach to improve the visual quality of refocused content captured by a camera array-based setup. Increasing the angular resolution using existing deep learning-based view synthesis method and refocusing the video using shift and sum refocusing algorithm produces over blurring of the in-focus region. Our enhancement method targets these blurry pixels and improves their quality by similarity detection and blending. Experimental results show that the proposed approach achieves better refocusing quality compared to traditional methods

    NeRSemble: Multi-view Radiance Field Reconstruction of Human Heads

    Full text link
    We focus on reconstructing high-fidelity radiance fields of human heads, capturing their animations over time, and synthesizing re-renderings from novel viewpoints at arbitrary time steps. To this end, we propose a new multi-view capture setup composed of 16 calibrated machine vision cameras that record time-synchronized images at 7.1 MP resolution and 73 frames per second. With our setup, we collect a new dataset of over 4700 high-resolution, high-framerate sequences of more than 220 human heads, from which we introduce a new human head reconstruction benchmark. The recorded sequences cover a wide range of facial dynamics, including head motions, natural expressions, emotions, and spoken language. In order to reconstruct high-fidelity human heads, we propose Dynamic Neural Radiance Fields using Hash Ensembles (NeRSemble). We represent scene dynamics by combining a deformation field and an ensemble of 3D multi-resolution hash encodings. The deformation field allows for precise modeling of simple scene movements, while the ensemble of hash encodings helps to represent complex dynamics. As a result, we obtain radiance field representations of human heads that capture motion over time and facilitate re-rendering of arbitrary novel viewpoints. In a series of experiments, we explore the design choices of our method and demonstrate that our approach outperforms state-of-the-art dynamic radiance field approaches by a significant margin.Comment: Siggraph 2023, Project Page: https://tobias-kirschstein.github.io/nersemble/ , Video: https://youtu.be/a-OAWqBzld

    X-Fields: Implicit Neural View-, Light- and Time-Image Interpolation

    Get PDF
    We suggest to represent an X-Field -a set of 2D images taken across different view, time or illumination conditions, i.e., video, light field, reflectance fields or combinations thereof-by learning a neural network (NN) to map their view, time or light coordinates to 2D images. Executing this NN at new coordinates results in joint view, time or light interpolation. The key idea to make this workable is a NN that already knows the "basic tricks" of graphics (lighting, 3D projection, occlusion) in a hard-coded and differentiable form. The NN represents the input to that rendering as an implicit map, that for any view, time, or light coordinate and for any pixel can quantify how it will move if view, time or light coordinates change (Jacobian of pixel position with respect to view, time, illumination, etc.). Our X-Field representation is trained for one scene within minutes, leading to a compact set of trainable parameters and hence real-time navigation in view, time and illumination

    3D Scene Modeling from Dense Video Light Fields

    Get PDF
    International audienceLight field imaging offers unprecedented opportunities for advanced scene analysis and modelling, with potential applications in various domains such as augmented reality, 3D robotics, and microscopy. This paper illustrates the potential of dense video light fields for 3D scene modeling. We first recall the principles of plenoptic cameras and present a downloadable test dataset captured with a Raytrix 2.0 plenop-tic camera. Methods to estimate the scene depth and to construct a 3D point cloud representation of the scene from the captured light field are then described.
    corecore