580 research outputs found

    Neural View-Interpolation for Sparse Light Field Video

    No full text
    We suggest representing light field (LF) videos as "one-off" neural networks (NN), i.e., a learned mapping from view-plus-time coordinates to high-resolution color values, trained on sparse views. Initially, this sounds like a bad idea for three main reasons: First, a NN LF will likely have less quality than a same-sized pixel basis representation. Second, only few training data, e.g., 9 exemplars per frame are available for sparse LF videos. Third, there is no generalization across LFs, but across view and time instead. Consequently, a network needs to be trained for each LF video. Surprisingly, these problems can turn into substantial advantages: Other than the linear pixel basis, a NN has to come up with a compact, non-linear i.e., more intelligent, explanation of color, conditioned on the sparse view and time coordinates. As observed for many NN however, this representation now is interpolatable: if the image output for sparse view coordinates is plausible, it is for all intermediate, continuous coordinates as well. Our specific network architecture involves a differentiable occlusion-aware warping step, which leads to a compact set of trainable parameters and consequently fast learning and fast execution

    Distortion-free Displacement Mapping

    Get PDF
    Displacement mapping is routinely used to add geometric details in a fast and easy‐to‐control way, both in offline rendering as well as recently in interactive applications such as games. However, it went largely unnoticed (with the exception of McGuire and Whitson [MW08]) that, when applying displacement mapping to a surface with a low‐distortion parametrization, this parametrization is distorted as the geometry was changed by the displacement mapping. Typical resulting artifacts are “rubber band”‐like distortion patterns in areas of strong displacement change where a small isotropic area in texture space is mapped to a large anisotropic area in world space. We describe a fast, fully GPU‐based two‐step procedure to resolve this problem. First, a correction deformation is computed from the displacement map. Second, two variants to apply this correction when computing displacement mapping are proposed. The first variant is backward‐compatible and can resolve the artifact in any rendering pipeline without modifying it and without requiring additional computation at render time, but only works for bijective parametrizations. The second variant works for more general parametrizations, but requires to modify the rendering code and incurs a very small computational overhead

    Non-Markovian Quantum State Diffusion for Temperature-Dependent Linear Spectra of Light Harvesting Aggregates

    Full text link
    Non-Markovian Quantum State Diffusion (NMQSD) has turned out to be an efficient method to calculate excitonic properties of aggregates composed of organic chromophores, taking into account the coupling of electronic transitions to vibrational modes of the chromophores. NMQSD is an open quantum system approach that incorporates environmental degrees of freedom (the vibrations in our case) in a stochastic way. We show in this paper that for linear optical spectra (absorption, circular dichroism) no stochastics is needed, even for finite temperatures. Thus, the spectra can be obtained by propagating a single trajectory. To this end we map a finite temperature environment to the zero temperature case using the so-called thermofield method. The resulting equations can then be solved efficiently by standard integrators.Comment: 14 pages, 4 figure

    Total Denoising: Unsupervised Learning of 3D Point Cloud Cleaning

    Get PDF
    We show that denoising of 3D point clouds can be learned unsupervised, directly from noisy 3D point cloud data only. This is achieved by extending recent ideas from learning of unsupervised image denoisers to unstructured 3D point clouds. Unsupervised image denoisers operate under the assumption that a noisy pixel observation is a random realization of a distribution around a clean pixel value, which allows appropriate learning on this distribution to eventually converge to the correct value. Regrettably, this assumption is not valid for unstructured points: 3D point clouds are subject to total noise, i.e. deviations in all coordinates, with no reliable pixel grid. Thus, an observation can be the realization of an entire manifold of clean 3D points, which makes the quality of a naive extension of unsupervised image denoisers to 3D point clouds unfortunately only little better than mean filtering. To overcome this, and to enable effective and unsupervised 3D point cloud denoising, we introduce a spatial prior term, that steers converges to the unique closest out of the many possible modes on the manifold. Our results demonstrate unsupervised denoising performance similar to that of supervised learning with clean data when given enough training examples - whereby we do not need any pairs of noisy and clean training data

    Deep Appearance Maps

    No full text
    We propose a deep representation of appearance, i. e. the relation of color, surface orientation, viewer position, material and illumination. Previous approaches have used deep learning to extract classic appearance representations relating to reflectance model parameters (e. g. Phong) or illumination (e. g. HDR environment maps). We suggest to directly represent appearance itself as a network we call a deep appearance map (DAM). This is a 4D generalization over 2D reflectance maps, which held the view direction fixed. First, we show how a DAM can be learned from images or video frames and later be used to synthesize appearance, given new surface orientations and viewer positions. Second, we demonstrate how another network can be used to map from an image or video frames to a DAM network to reproduce this appearance, without using a lengthy optimization such as stochastic gradient descent (learning-to-learn). Finally, we generalize this to an appearance estimation-and-segmentation task, where we map from an image showing multiple materials to multiple networks reproducing their appearance, as well as per-pixel segmentation

    Plausible Shading Decomposition For Layered Photo Retouching

    Get PDF
    Photographers routinely compose multiple manipulated photos of the same scene (layers) into a single image, which is better than any individual photo could be alone. Similarly, 3D artists set up rendering systems to produce layered images to contain only individual aspects of the light transport, which are composed into the final result in post-production. Regrettably, both approaches either take considerable time to capture, or remain limited to synthetic scenes. In this paper, we suggest a system to allow decomposing a single image into a plausible shading decomposition (PSD) that approximates effects such as shadow, diffuse illumination, albedo, and specular shading. This decomposition can then be manipulated in any off-the-shelf image manipulation software and recomposited back. We perform such a decomposition by learning a convolutional neural network trained using synthetic data. We demonstrate the effectiveness of our decomposition on synthetic (i.e., rendered) and real data (i.e., photographs), and use them for common photo manipulation, which are nearly impossible to perform otherwise from single images

    Finding Your (3D) Center: 3D Object Detection Using a Learned Loss

    Get PDF
    Massive semantically labeled datasets are readily available for 2D images, however, are much harder to achieve for 3D scenes. Objects in 3D repositories like ShapeNet are labeled, but regrettably only in isolation, so without context. 3D scenes can be acquired by range scanners on city-level scale, but much fewer with semantic labels. Addressing this disparity, we introduce a new optimization procedure, which allows training for 3D detection with raw 3D scans while using as little as 5% of the object labels and still achieve comparable performance. Our optimization uses two networks. A scene network maps an entire 3D scene to a set of 3D object centers. As we assume the scene not to be labeled by centers, no classic loss, such as Chamfer can be used to train it. Instead, we use another network to emulate the loss. This loss network is trained on a small labeled subset and maps a non-centered 3D object in the presence of distractions to its own center. This function is very similar – and hence can be used instead of – the gradient the supervised loss would provide. Our evaluation documents competitive fidelity at a much lower level of supervision, respectively higher quality at comparable supervision. Supplementary material can be found at: dgriffiths3.github.io

    Escaping Plato's Cave using Adversarial Training: 3D Shape From Unstructured 2D Image Collections

    Get PDF
    We introduce PLATONICGAN to discover the 3D structure of an object class from an unstructured collection of 2D images, i. e., neither any relation between the images is available nor additional information about the images is known. The key idea is to train a deep neural network to generate 3D shapes which rendered to images are indistinguishable from ground truth images (for a discriminator) under various camera models (i. e., rendering layers) and camera poses. Discriminating 2D images instead of 3D shapes allows tapping into unstructured 2D photo collections instead of relying on curated (e.g., aligned, annotated, etc.) 3D data sets. To establish constraints between 2D image observation and their 3D interpretation, we suggest a family of rendering layers that are effectively differentiable. This family includes visual hull, absorption-only (akin to x-ray), and emissionabsorption. We can successfully reconstruct 3D shapes from unstructured 2D images and extensively evaluate PLATONICGAN on a range of synthetic and real data sets achieving consistent improvements over baseline methods. We can also show that our method with additional 3D supervision further improves result quality and even surpasses the performance of 3D supervised methods

    Deep-learning the Latent Space of Light Transport

    Get PDF
    We suggest a method to directly deep‐learn light transport, i. e., the mapping from a 3D geometry‐illumination‐material configuration to a shaded 2D image. While many previous learning methods have employed 2D convolutional neural networks applied to images, we show for the first time that light transport can be learned directly in 3D. The benefit of 3D over 2D is, that the former can also correctly capture illumination effects related to occluded and/or semi‐transparent geometry. To learn 3D light transport, we represent the 3D scene as an unstructured 3D point cloud, which is later, during rendering, projected to the 2D output image. Thus, we suggest a two‐stage operator comprising a 3D network that first transforms the point cloud into a latent representation, which is later on projected to the 2D output image using a dedicated 3D‐2D network in a second step. We will show that our approach results in improved quality in terms of temporal coherence while retaining most of the computational efficiency of common 2D methods. As a consequence, the proposed two stage‐operator serves as a valuable extension to modern deferred shading approaches

    Learning on the Edge: Investigating Boundary Filters in CNNs

    Get PDF
    Convolutional neural networks (CNNs) handle the case where filters extend beyond the image boundary using several heuristics, such as zero, repeat or mean padding. These schemes are applied in an ad-hoc fashion and, being weakly related to the image content and oblivious of the target task, result in low output quality at the boundary. In this paper, we propose a simple and effective improvement that learns the boundary handling itself. At training-time, the network is provided with a separate set of explicit boundary filters. At testing-time, we use these filters which have learned to extrapolate features at the boundary in an optimal way for the specific task. Our extensive evaluation, over a wide range of architectural changes (variations of layers, feature channels, or both), shows how the explicit filters result in improved boundary handling. Furthermore, we investigate the efficacy of variations of such boundary filters with respect to convergence speed and accuracy. Finally, we demonstrate an improvement of 5–20% across the board of typical CNN applications (colorization, de-Bayering, optical flow, disparity estimation, and super-resolution). Supplementary material and code can be downloaded from the project page: http://geometry.cs.ucl.ac.uk/projects/2019/investigating-edge/