25,408 research outputs found
DeepVoxels: Learning Persistent 3D Feature Embeddings
In this work, we address the lack of 3D understanding of generative neural
networks by introducing a persistent 3D feature embedding for view synthesis.
To this end, we propose DeepVoxels, a learned representation that encodes the
view-dependent appearance of a 3D scene without having to explicitly model its
geometry. At its core, our approach is based on a Cartesian 3D grid of
persistent embedded features that learn to make use of the underlying 3D scene
structure. Our approach combines insights from 3D geometric computer vision
with recent advances in learning image-to-image mappings based on adversarial
loss functions. DeepVoxels is supervised, without requiring a 3D reconstruction
of the scene, using a 2D re-rendering loss and enforces perspective and
multi-view geometry in a principled manner. We apply our persistent 3D scene
representation to the problem of novel view synthesis demonstrating
high-quality results for a variety of challenging scenes.Comment: Video: https://www.youtube.com/watch?v=HM_WsZhoGXw Supplemental
material:
https://drive.google.com/file/d/1BnZRyNcVUty6-LxAstN83H79ktUq8Cjp/view?usp=sharing
Code: https://github.com/vsitzmann/deepvoxels Project page:
https://vsitzmann.github.io/deepvoxels
GRASS: Generative Recursive Autoencoders for Shape Structures
We introduce a novel neural network architecture for encoding and synthesis
of 3D shapes, particularly their structures. Our key insight is that 3D shapes
are effectively characterized by their hierarchical organization of parts,
which reflects fundamental intra-shape relationships such as adjacency and
symmetry. We develop a recursive neural net (RvNN) based autoencoder to map a
flat, unlabeled, arbitrary part layout to a compact code. The code effectively
captures hierarchical structures of man-made 3D objects of varying structural
complexities despite being fixed-dimensional: an associated decoder maps a code
back to a full hierarchy. The learned bidirectional mapping is further tuned
using an adversarial setup to yield a generative model of plausible structures,
from which novel structures can be sampled. Finally, our structure synthesis
framework is augmented by a second trained module that produces fine-grained
part geometry, conditioned on global and local structural context, leading to a
full generative pipeline for 3D shapes. We demonstrate that without
supervision, our network learns meaningful structural hierarchies adhering to
perceptual grouping principles, produces compact codes which enable
applications such as shape classification and partial matching, and supports
shape synthesis and interpolation with significant variations in topology and
geometry.Comment: Corresponding author: Kai Xu ([email protected]
Neural View-Interpolation for Sparse Light Field Video
We suggest representing light field (LF) videos as "one-off" neural networks (NN), i.e., a learned mapping from view-plus-time coordinates to high-resolution color values, trained on sparse views. Initially, this sounds like a bad idea for three main reasons: First, a NN LF will likely have less quality than a same-sized pixel basis representation. Second, only few training data, e.g., 9 exemplars per frame are available for sparse LF videos. Third, there is no generalization across LFs, but across view and time instead. Consequently, a network needs to be trained for each LF video. Surprisingly, these problems can turn into substantial advantages: Other than the linear pixel basis, a NN has to come up with a compact, non-linear i.e., more intelligent, explanation of color, conditioned on the sparse view and time coordinates. As observed for many NN however, this representation now is interpolatable: if the image output for sparse view coordinates is plausible, it is for all intermediate, continuous coordinates as well. Our specific network architecture involves a differentiable occlusion-aware warping step, which leads to a compact set of trainable parameters and consequently fast learning and fast execution
- …