10 research outputs found
SCADE: NeRFs from Space Carving with Ambiguity-Aware Depth Estimates
Neural radiance fields (NeRFs) have enabled high fidelity 3D reconstruction
from multiple 2D input views. However, a well-known drawback of NeRFs is the
less-than-ideal performance under a small number of views, due to insufficient
constraints enforced by volumetric rendering. To address this issue, we
introduce SCADE, a novel technique that improves NeRF reconstruction quality on
sparse, unconstrained input views for in-the-wild indoor scenes. To constrain
NeRF reconstruction, we leverage geometric priors in the form of per-view depth
estimates produced with state-of-the-art monocular depth estimation models,
which can generalize across scenes. A key challenge is that monocular depth
estimation is an ill-posed problem, with inherent ambiguities. To handle this
issue, we propose a new method that learns to predict, for each view, a
continuous, multimodal distribution of depth estimates using conditional
Implicit Maximum Likelihood Estimation (cIMLE). In order to disambiguate
exploiting multiple views, we introduce an original space carving loss that
guides the NeRF representation to fuse multiple hypothesized depth maps from
each view and distill from them a common geometry that is consistent with all
views. Experiments show that our approach enables higher fidelity novel view
synthesis from sparse views. Our project page can be found at
https://scade-spacecarving-nerfs.github.io .Comment: CVPR 202
ProvNeRF: Modeling per Point Provenance in NeRFs as a Stochastic Process
Neural radiance fields (NeRFs) have gained popularity across various
applications. However, they face challenges in the sparse view setting, lacking
sufficient constraints from volume rendering. Reconstructing and understanding
a 3D scene from sparse and unconstrained cameras is a long-standing problem in
classical computer vision with diverse applications. While recent works have
explored NeRFs in sparse, unconstrained view scenarios, their focus has been
primarily on enhancing reconstruction and novel view synthesis. Our approach
takes a broader perspective by posing the question: "from where has each point
been seen?" -- which gates how well we can understand and reconstruct it. In
other words, we aim to determine the origin or provenance of each 3D point and
its associated information under sparse, unconstrained views. We introduce
ProvNeRF, a model that enriches a traditional NeRF representation by
incorporating per-point provenance, modeling likely source locations for each
point. We achieve this by extending implicit maximum likelihood estimation
(IMLE) for stochastic processes. Notably, our method is compatible with any
pre-trained NeRF model and the associated training camera poses. We demonstrate
that modeling per-point provenance offers several advantages, including
uncertainty estimation, criteria-based view selection, and improved novel view
synthesis, compared to state-of-the-art methods. Please visit our project page
at https://provnerf.github.i
OptCtrlPoints: Finding the Optimal Control Points for Biharmonic 3D Shape Deformation
We propose OptCtrlPoints, a data-driven framework designed to identify the
optimal sparse set of control points for reproducing target shapes using
biharmonic 3D shape deformation. Control-point-based 3D deformation methods are
widely utilized for interactive shape editing, and their usability is enhanced
when the control points are sparse yet strategically distributed across the
shape. With this objective in mind, we introduce a data-driven approach that
can determine the most suitable set of control points, assuming that we have a
given set of possible shape variations. The challenges associated with this
task primarily stem from the computationally demanding nature of the problem.
Two main factors contribute to this complexity: solving a large linear system
for the biharmonic weight computation and addressing the combinatorial problem
of finding the optimal subset of mesh vertices. To overcome these challenges,
we propose a reformulation of the biharmonic computation that reduces the
matrix size, making it dependent on the number of control points rather than
the number of vertices. Additionally, we present an efficient search algorithm
that significantly reduces the time complexity while still delivering a nearly
optimal solution. Experiments on SMPL, SMAL, and DeformingThings4D datasets
demonstrate the efficacy of our method. Our control points achieve better
template-to-target fit than FPS, random search, and neural-network-based
prediction. We also highlight the significant reduction in computation time
from days to approximately 3 minutes.Comment: Pacific Graphics 2023 (Full Paper
NeRF Revisited: Fixing Quadrature Instability in Volume Rendering
Neural radiance fields (NeRF) rely on volume rendering to synthesize novel
views. Volume rendering requires evaluating an integral along each ray, which
is numerically approximated with a finite sum that corresponds to the exact
integral along the ray under piecewise constant volume density. As a
consequence, the rendered result is unstable w.r.t. the choice of samples along
the ray, a phenomenon that we dub quadrature instability. We propose a
mathematically principled solution by reformulating the sample-based rendering
equation so that it corresponds to the exact integral under piecewise linear
volume density. This simultaneously resolves multiple issues: conflicts between
samples along different rays, imprecise hierarchical sampling, and
non-differentiability of quantiles of ray termination distances w.r.t. model
parameters. We demonstrate several benefits over the classical sample-based
rendering equation, such as sharper textures, better geometric reconstruction,
and stronger depth supervision. Our proposed formulation can be also be used as
a drop-in replacement to the volume rendering equation of existing NeRF-based
methods. Our project page can be found at pl-nerf.github.io.Comment: Neurips 202
DiffFacto: Controllable Part-Based 3D Point Cloud Generation with Cross Diffusion
While the community of 3D point cloud generation has witnessed a big growth
in recent years, there still lacks an effective way to enable intuitive user
control in the generation process, hence limiting the general utility of such
methods. Since an intuitive way of decomposing a shape is through its parts, we
propose to tackle the task of controllable part-based point cloud generation.
We introduce DiffFacto, a novel probabilistic generative model that learns the
distribution of shapes with part-level control. We propose a factorization that
models independent part style and part configuration distributions and presents
a novel cross-diffusion network that enables us to generate coherent and
plausible shapes under our proposed factorization. Experiments show that our
method is able to generate novel shapes with multiple axes of control. It
achieves state-of-the-art part-level generation quality and generates plausible
and coherent shapes while enabling various downstream editing applications such
as shape interpolation, mixing, and transformation editing. Project website:
https://difffacto.github.io
Point2Cyl: Reverse Engineering 3D Objects from Point Clouds to Extrusion Cylinders
We propose Point2Cyl, a supervised network transforming a raw 3D point cloud
to a set of extrusion cylinders. Reverse engineering from a raw geometry to a
CAD model is an essential task to enable manipulation of the 3D data in shape
editing software and thus expand their usages in many downstream applications.
Particularly, the form of CAD models having a sequence of extrusion cylinders
-- a 2D sketch plus an extrusion axis and range -- and their boolean
combinations is not only widely used in the CAD community/software but also has
great expressivity of shapes, compared to having limited types of primitives
(e.g., planes, spheres, and cylinders). In this work, we introduce a neural
network that solves the extrusion cylinder decomposition problem in a
geometry-grounded way by first learning underlying geometric proxies.
Precisely, our approach first predicts per-point segmentation, base/barrel
labels and normals, then estimates for the underlying extrusion parameters in
differentiable and closed-form formulations. Our experiments show that our
approach demonstrates the best performance on two recent CAD datasets, Fusion
Gallery and DeepCAD, and we further showcase our approach on reverse
engineering and editing.Comment: CVPR 202
LCD: learned cross-domain descriptors for 2D-3D matching
In this work, we present a novel method to learn a local cross-domain descriptor for 2D image and 3D point cloud matching. Our proposed method is a dual auto-encoder neural network that maps 2D and 3D input into a shared latent space representation. We show that such local cross-domain descriptors in the shared embedding are more discriminative than those obtained from individual training in 2D and 3D domains. To facilitate the training process, we built a new dataset by collecting ≈ 1.4 millions of 2D-3D correspondences with various lighting conditions and settings from publicly available RGB-D scenes. Our descriptor is evaluated in three main experiments: 2D-3D matching, cross-domain retrieval, and sparse-to-dense depth estimation. Experimental results confirm the robustness of our approach as well as its competitive performance not only in solving cross-domain tasks but also in being able to generalize to solve sole 2D and 3D tasks. Our dataset and code are released publicly at https://hkust-vgd.github.io/lcd
PartNeRF: Generating Part-Aware Editable 3D Shapes without 3D Supervision
International audienceImpressive progress in generative models and implicit representations gave rise to methods that can generate 3D shapes of high quality. However, being able to locally control and edit shapes is another essential property that can unlock several content creation applications. Local control can be achieved with part-aware models, but existing methods require 3D supervision and cannot produce textures. In this work, we devise PartNeRF, a novel part-aware generative model for editable 3D shape synthesis that does not require any explicit 3D supervision. Our model generates objects as a set of locally defined NeRFs, augmented with an affine transformation. This enables several editing operations such as applying transformations on parts, mixing parts from different objects etc. To ensure distinct, manipulable parts we enforce a hard assignment of rays to parts that makes sure that the color of each ray is only determined by a single NeRF. As a result, altering one part does not affect the appearance of the others. Evaluations on various ShapeNet categories demonstrate the ability of our model to generate editable 3D objects of improved fidelity, compared to previous part-based generative approaches that require 3D supervision or models relying on NeRFs