53 research outputs found

    Control-NeRF: Editable Feature Volumes for Scene Rendering and Manipulation

    Get PDF
    We present a novel method for performing flexible, 3D-aware image contentmanipulation while enabling high-quality novel view synthesis. While NeRF-basedapproaches are effective for novel view synthesis, such models memorize theradiance for every point in a scene within a neural network. Since these modelsare scene-specific and lack a 3D scene representation, classical editing suchas shape manipulation, or combining scenes is not possible. Hence, editing andcombining NeRF-based scenes has not been demonstrated. With the aim ofobtaining interpretable and controllable scene representations, our modelcouples learnt scene-specific feature volumes with a scene agnostic neuralrendering network. With this hybrid representation, we decouple neuralrendering from scene-specific geometry and appearance. We can generalize tonovel scenes by optimizing only the scene-specific 3D feature representation,while keeping the parameters of the rendering network fixed. The renderingfunction learnt during the initial training stage can thus be easily applied tonew scenes, making our approach more flexible. More importantly, since thefeature volumes are independent of the rendering model, we can manipulate andcombine scenes by editing their corresponding feature volumes. The editedvolume can then be plugged into the rendering model to synthesize high-qualitynovel views. We demonstrate various scene manipulations, including mixingscenes, deforming objects and inserting objects into scenes, while stillproducing photo-realistic results.<br

    Sphere-Guided Training of Neural Implicit Surfaces

    Full text link
    In recent years, surface modeling via neural implicit functions has become one of the main techniques for multi-view 3D reconstruction. However, the state-of-the-art methods rely on the implicit functions to model an entire volume of the scene, leading to reduced reconstruction fidelity in the areas with thin objects or high-frequency details. To address that, we present a method for jointly training neural implicit surfaces alongside an auxiliary explicit shape representation, which acts as surface guide. In our approach, this representation encapsulates the surface region of the scene and enables us to boost the efficiency of the implicit function training by only modeling the volume in that region. We propose using a set of learnable spherical primitives as a learnable surface guidance since they can be efficiently trained alongside the neural surface function using its gradients. Our training pipeline consists of iterative updates of the spheres' centers using the gradients of the implicit function and then fine-tuning the latter to the updated surface region of the scene. We show that such modification to the training procedure can be plugged into several popular implicit reconstruction methods, improving the quality of the results over multiple 3D reconstruction benchmarks

    PAPR: Proximity Attention Point Rendering

    Full text link
    Learning accurate and parsimonious point cloud representations of scene surfaces from scratch remains a challenge in 3D representation learning. Existing point-based methods often suffer from the vanishing gradient problem or require a large number of points to accurately model scene geometry and texture. To address these limitations, we propose Proximity Attention Point Rendering (PAPR), a novel method that consists of a point-based scene representation and a differentiable renderer. Our scene representation uses a point cloud where each point is characterized by its spatial position, foreground score, and view-independent feature vector. The renderer selects the relevant points for each ray and produces accurate colours using their associated features. PAPR effectively learns point cloud positions to represent the correct scene geometry, even when the initialization drastically differs from the target geometry. Notably, our method captures fine texture details while using only a parsimonious set of points. We also demonstrate four practical applications of our method: geometry editing, object manipulation, texture transfer, and exposure control. More results and code are available on our project website at https://zvict.github.io/papr/

    LiveNVS: Neural View Synthesis on Live RGB-D Streams

    Full text link
    Existing real-time RGB-D reconstruction approaches, like Kinect Fusion, lack real-time photo-realistic visualization. This is due to noisy, oversmoothed or incomplete geometry and blurry textures which are fused from imperfect depth maps and camera poses. Recent neural rendering methods can overcome many of such artifacts but are mostly optimized for offline usage, hindering the integration into a live reconstruction pipeline. In this paper, we present LiveNVS, a system that allows for neural novel view synthesis on a live RGB-D input stream with very low latency and real-time rendering. Based on the RGB-D input stream, novel views are rendered by projecting neural features into the target view via a densely fused depth map and aggregating the features in image-space to a target feature map. A generalizable neural network then translates the target feature map into a high-quality RGB image. LiveNVS achieves state-of-the-art neural rendering quality of unknown scenes during capturing, allowing users to virtually explore the scene and assess reconstruction quality in real-time.Comment: main paper: 8 pages, total number of pages: 15, 13 figures, to be published in SIGGRAPH Asia 2023 Conference Papers; edits: link was fixe

    Neural Rays for Occlusion-aware Image-based Rendering

    Get PDF
    We present a new neural representation, called Neural Ray (NeuRay), for the novel view synthesis task. Recent works construct radiance fields from image features of input views to render novel view images, which enables the generalization to new scenes. However, due to occlusions, a 3D point may be invisible to some input views. On such a 3D point, these generalization methods will include inconsistent image features from invisible views, which interfere with the radiance field construction. To solve this problem, we predict the visibility of 3D points to input views within our NeuRay representation. This visibility enables the radiance field construction to focus on visible image features, which significantly improves its rendering quality. Meanwhile, a novel consistency loss is proposed to refine the visibility in NeuRay when finetuning on a specific scene. Experiments demonstrate that our approach achieves state-of-the-art performance on the novel view synthesis task when generalizing to unseen scenes and outperforms per-scene optimization methods after finetuning.Comment: CVPR2022. Project page https://liuyuan-pal.github.io/NeuRay/ Codes available at https://github.com/liuyuan-pal/NeuRa

    PonderV2: Pave the Way for 3D Foundation Model with A Universal Pre-training Paradigm

    Full text link
    In contrast to numerous NLP and 2D computer vision foundational models, the learning of a robust and highly generalized 3D foundational model poses considerably greater challenges. This is primarily due to the inherent data variability and the diversity of downstream tasks. In this paper, we introduce a comprehensive 3D pre-training framework designed to facilitate the acquisition of efficient 3D representations, thereby establishing a pathway to 3D foundational models. Motivated by the fact that informative 3D features should be able to encode rich geometry and appearance cues that can be utilized to render realistic images, we propose a novel universal paradigm to learn point cloud representations by differentiable neural rendering, serving as a bridge between 3D and 2D worlds. We train a point cloud encoder within a devised volumetric neural renderer by comparing the rendered images with the real images. Notably, our approach demonstrates the seamless integration of the learned 3D encoder into diverse downstream tasks. These tasks encompass not only high-level challenges such as 3D detection and segmentation but also low-level objectives like 3D reconstruction and image synthesis, spanning both indoor and outdoor scenarios. Besides, we also illustrate the capability of pre-training a 2D backbone using the proposed universal methodology, surpassing conventional pre-training methods by a large margin. For the first time, PonderV2 achieves state-of-the-art performance on 11 indoor and outdoor benchmarks. The consistent improvements in various settings imply the effectiveness of the proposed method. Code and models will be made available at https://github.com/OpenGVLab/PonderV2.Comment: arXiv admin note: text overlap with arXiv:2301.0015

    Probabilistic Point Cloud Modeling via Self-Organizing Gaussian Mixture Models

    Full text link
    This letter presents a continuous probabilistic modeling methodology for spatial point cloud data using finite Gaussian Mixture Models (GMMs) where the number of components are adapted based on the scene complexity. Few hierarchical and adaptive methods have been proposed to address the challenge of balancing model fidelity with size. Instead, state-of-the-art mapping approaches require tuning parameters for specific use cases, but do not generalize across diverse environments. To address this gap, we utilize a self-organizing principle from information-theoretic learning to automatically adapt the complexity of the GMM model based on the relevant information in the sensor data. The approach is evaluated against existing point cloud modeling techniques on real-world data with varying degrees of scene complexity.Comment: 8 pages, 6 figures, to appear in IEEE Robotics and Automation Letter
    corecore