53 research outputs found
Control-NeRF: Editable Feature Volumes for Scene Rendering and Manipulation
We present a novel method for performing flexible, 3D-aware image contentmanipulation while enabling high-quality novel view synthesis. While NeRF-basedapproaches are effective for novel view synthesis, such models memorize theradiance for every point in a scene within a neural network. Since these modelsare scene-specific and lack a 3D scene representation, classical editing suchas shape manipulation, or combining scenes is not possible. Hence, editing andcombining NeRF-based scenes has not been demonstrated. With the aim ofobtaining interpretable and controllable scene representations, our modelcouples learnt scene-specific feature volumes with a scene agnostic neuralrendering network. With this hybrid representation, we decouple neuralrendering from scene-specific geometry and appearance. We can generalize tonovel scenes by optimizing only the scene-specific 3D feature representation,while keeping the parameters of the rendering network fixed. The renderingfunction learnt during the initial training stage can thus be easily applied tonew scenes, making our approach more flexible. More importantly, since thefeature volumes are independent of the rendering model, we can manipulate andcombine scenes by editing their corresponding feature volumes. The editedvolume can then be plugged into the rendering model to synthesize high-qualitynovel views. We demonstrate various scene manipulations, including mixingscenes, deforming objects and inserting objects into scenes, while stillproducing photo-realistic results.<br
Sphere-Guided Training of Neural Implicit Surfaces
In recent years, surface modeling via neural implicit functions has become
one of the main techniques for multi-view 3D reconstruction. However, the
state-of-the-art methods rely on the implicit functions to model an entire
volume of the scene, leading to reduced reconstruction fidelity in the areas
with thin objects or high-frequency details. To address that, we present a
method for jointly training neural implicit surfaces alongside an auxiliary
explicit shape representation, which acts as surface guide. In our approach,
this representation encapsulates the surface region of the scene and enables us
to boost the efficiency of the implicit function training by only modeling the
volume in that region. We propose using a set of learnable spherical primitives
as a learnable surface guidance since they can be efficiently trained alongside
the neural surface function using its gradients. Our training pipeline consists
of iterative updates of the spheres' centers using the gradients of the
implicit function and then fine-tuning the latter to the updated surface region
of the scene. We show that such modification to the training procedure can be
plugged into several popular implicit reconstruction methods, improving the
quality of the results over multiple 3D reconstruction benchmarks
PAPR: Proximity Attention Point Rendering
Learning accurate and parsimonious point cloud representations of scene
surfaces from scratch remains a challenge in 3D representation learning.
Existing point-based methods often suffer from the vanishing gradient problem
or require a large number of points to accurately model scene geometry and
texture. To address these limitations, we propose Proximity Attention Point
Rendering (PAPR), a novel method that consists of a point-based scene
representation and a differentiable renderer. Our scene representation uses a
point cloud where each point is characterized by its spatial position,
foreground score, and view-independent feature vector. The renderer selects the
relevant points for each ray and produces accurate colours using their
associated features. PAPR effectively learns point cloud positions to represent
the correct scene geometry, even when the initialization drastically differs
from the target geometry. Notably, our method captures fine texture details
while using only a parsimonious set of points. We also demonstrate four
practical applications of our method: geometry editing, object manipulation,
texture transfer, and exposure control. More results and code are available on
our project website at https://zvict.github.io/papr/
LiveNVS: Neural View Synthesis on Live RGB-D Streams
Existing real-time RGB-D reconstruction approaches, like Kinect Fusion, lack
real-time photo-realistic visualization. This is due to noisy, oversmoothed or
incomplete geometry and blurry textures which are fused from imperfect depth
maps and camera poses. Recent neural rendering methods can overcome many of
such artifacts but are mostly optimized for offline usage, hindering the
integration into a live reconstruction pipeline.
In this paper, we present LiveNVS, a system that allows for neural novel view
synthesis on a live RGB-D input stream with very low latency and real-time
rendering. Based on the RGB-D input stream, novel views are rendered by
projecting neural features into the target view via a densely fused depth map
and aggregating the features in image-space to a target feature map. A
generalizable neural network then translates the target feature map into a
high-quality RGB image. LiveNVS achieves state-of-the-art neural rendering
quality of unknown scenes during capturing, allowing users to virtually explore
the scene and assess reconstruction quality in real-time.Comment: main paper: 8 pages, total number of pages: 15, 13 figures, to be
published in SIGGRAPH Asia 2023 Conference Papers; edits: link was fixe
Neural Rays for Occlusion-aware Image-based Rendering
We present a new neural representation, called Neural Ray (NeuRay), for the
novel view synthesis task. Recent works construct radiance fields from image
features of input views to render novel view images, which enables the
generalization to new scenes. However, due to occlusions, a 3D point may be
invisible to some input views. On such a 3D point, these generalization methods
will include inconsistent image features from invisible views, which interfere
with the radiance field construction. To solve this problem, we predict the
visibility of 3D points to input views within our NeuRay representation. This
visibility enables the radiance field construction to focus on visible image
features, which significantly improves its rendering quality. Meanwhile, a
novel consistency loss is proposed to refine the visibility in NeuRay when
finetuning on a specific scene. Experiments demonstrate that our approach
achieves state-of-the-art performance on the novel view synthesis task when
generalizing to unseen scenes and outperforms per-scene optimization methods
after finetuning.Comment: CVPR2022. Project page https://liuyuan-pal.github.io/NeuRay/ Codes
available at https://github.com/liuyuan-pal/NeuRa
PonderV2: Pave the Way for 3D Foundation Model with A Universal Pre-training Paradigm
In contrast to numerous NLP and 2D computer vision foundational models, the
learning of a robust and highly generalized 3D foundational model poses
considerably greater challenges. This is primarily due to the inherent data
variability and the diversity of downstream tasks. In this paper, we introduce
a comprehensive 3D pre-training framework designed to facilitate the
acquisition of efficient 3D representations, thereby establishing a pathway to
3D foundational models. Motivated by the fact that informative 3D features
should be able to encode rich geometry and appearance cues that can be utilized
to render realistic images, we propose a novel universal paradigm to learn
point cloud representations by differentiable neural rendering, serving as a
bridge between 3D and 2D worlds. We train a point cloud encoder within a
devised volumetric neural renderer by comparing the rendered images with the
real images. Notably, our approach demonstrates the seamless integration of the
learned 3D encoder into diverse downstream tasks. These tasks encompass not
only high-level challenges such as 3D detection and segmentation but also
low-level objectives like 3D reconstruction and image synthesis, spanning both
indoor and outdoor scenarios. Besides, we also illustrate the capability of
pre-training a 2D backbone using the proposed universal methodology, surpassing
conventional pre-training methods by a large margin. For the first time,
PonderV2 achieves state-of-the-art performance on 11 indoor and outdoor
benchmarks. The consistent improvements in various settings imply the
effectiveness of the proposed method. Code and models will be made available at
https://github.com/OpenGVLab/PonderV2.Comment: arXiv admin note: text overlap with arXiv:2301.0015
Probabilistic Point Cloud Modeling via Self-Organizing Gaussian Mixture Models
This letter presents a continuous probabilistic modeling methodology for
spatial point cloud data using finite Gaussian Mixture Models (GMMs) where the
number of components are adapted based on the scene complexity. Few
hierarchical and adaptive methods have been proposed to address the challenge
of balancing model fidelity with size. Instead, state-of-the-art mapping
approaches require tuning parameters for specific use cases, but do not
generalize across diverse environments. To address this gap, we utilize a
self-organizing principle from information-theoretic learning to automatically
adapt the complexity of the GMM model based on the relevant information in the
sensor data. The approach is evaluated against existing point cloud modeling
techniques on real-world data with varying degrees of scene complexity.Comment: 8 pages, 6 figures, to appear in IEEE Robotics and Automation Letter
- …