49 research outputs found
Generalizable Neural Voxels for Fast Human Radiance Fields
Rendering moving human bodies at free viewpoints only from a monocular video
is quite a challenging problem. The information is too sparse to model
complicated human body structures and motions from both view and pose
dimensions. Neural radiance fields (NeRF) have shown great power in novel view
synthesis and have been applied to human body rendering. However, most current
NeRF-based methods bear huge costs for both training and rendering, which
impedes the wide applications in real-life scenarios. In this paper, we propose
a rendering framework that can learn moving human body structures extremely
quickly from a monocular video. The framework is built by integrating both
neural fields and neural voxels. Especially, a set of generalizable neural
voxels are constructed. With pretrained on various human bodies, these general
voxels represent a basic skeleton and can provide strong geometric priors. For
the fine-tuning process, individual voxels are constructed for learning
differential textures, complementary to general voxels. Thus learning a novel
body can be further accelerated, taking only a few minutes. Our method shows
significantly higher training efficiency compared with previous methods, while
maintaining similar rendering quality. The project page is at
https://taoranyi.com/gneuvox .Comment: Project page: http://taoranyi.com/gneuvo
Fast High Dynamic Range Radiance Fields for Dynamic Scenes
Neural Radiances Fields (NeRF) and their extensions have shown great success
in representing 3D scenes and synthesizing novel-view images. However, most
NeRF methods take in low-dynamic-range (LDR) images, which may lose details,
especially with nonuniform illumination. Some previous NeRF methods attempt to
introduce high-dynamic-range (HDR) techniques but mainly target static scenes.
To extend HDR NeRF methods to wider applications, we propose a dynamic HDR NeRF
framework, named HDR-HexPlane, which can learn 3D scenes from dynamic 2D images
captured with various exposures. A learnable exposure mapping function is
constructed to obtain adaptive exposure values for each image. Based on the
monotonically increasing prior, a camera response function is designed for
stable learning. With the proposed model, high-quality novel-view images at any
time point can be rendered with any desired exposure. We further construct a
dataset containing multiple dynamic scenes captured with diverse exposures for
evaluation. All the datasets and code are available at
\url{https://guanjunwu.github.io/HDR-HexPlane/}.Comment: 3DV 2024. Project page: https://guanjunwu.github.io/HDR-HexPlan
GaussianEditor: Editing 3D Gaussians Delicately with Text Instructions
Recently, impressive results have been achieved in 3D scene editing with text
instructions based on a 2D diffusion model. However, current diffusion models
primarily generate images by predicting noise in the latent space, and the
editing is usually applied to the whole image, which makes it challenging to
perform delicate, especially localized, editing for 3D scenes. Inspired by
recent 3D Gaussian splatting, we propose a systematic framework, named
GaussianEditor, to edit 3D scenes delicately via 3D Gaussians with text
instructions. Benefiting from the explicit property of 3D Gaussians, we design
a series of techniques to achieve delicate editing. Specifically, we first
extract the region of interest (RoI) corresponding to the text instruction,
aligning it to 3D Gaussians. The Gaussian RoI is further used to control the
editing process. Our framework can achieve more delicate and precise editing of
3D scenes than previous methods while enjoying much faster training speed, i.e.
within 20 minutes on a single V100 GPU, more than twice as fast as
Instruct-NeRF2NeRF (45 minutes -- 2 hours).Comment: Project page: https://GaussianEditor.github.i
Segment Anything in 3D with NeRFs
The Segment Anything Model (SAM) has demonstrated its effectiveness in
segmenting any object/part in various 2D images, yet its ability for 3D has not
been fully explored. The real world is composed of numerous 3D scenes and
objects. Due to the scarcity of accessible 3D data and high cost of its
acquisition and annotation, lifting SAM to 3D is a challenging but valuable
research avenue. With this in mind, we propose a novel framework to Segment
Anything in 3D, named SA3D. Given a neural radiance field (NeRF) model, SA3D
allows users to obtain the 3D segmentation result of any target object via only
one-shot manual prompting in a single rendered view. With input prompts, SAM
cuts out the target object from the according view. The obtained 2D
segmentation mask is projected onto 3D mask grids via density-guided inverse
rendering. 2D masks from other views are then rendered, which are mostly
uncompleted but used as cross-view self-prompts to be fed into SAM again.
Complete masks can be obtained and projected onto mask grids. This procedure is
executed via an iterative manner while accurate 3D masks can be finally
learned. SA3D can adapt to various radiance fields effectively without any
additional redesigning. The entire segmentation process can be completed in
approximately two minutes without any engineering optimization. Our experiments
demonstrate the effectiveness of SA3D in different scenes, highlighting the
potential of SAM in 3D scene perception. The project page is at
https://jumpat.github.io/SA3D/.Comment: Work in progress. Project page: https://jumpat.github.io/SA3D
GaussianObject: Just Taking Four Images to Get A High-Quality 3D Object with Gaussian Splatting
Reconstructing and rendering 3D objects from highly sparse views is of
critical importance for promoting applications of 3D vision techniques and
improving user experience. However, images from sparse views only contain very
limited 3D information, leading to two significant challenges: 1) Difficulty in
building multi-view consistency as images for matching are too few; 2)
Partially omitted or highly compressed object information as view coverage is
insufficient. To tackle these challenges, we propose GaussianObject, a
framework to represent and render the 3D object with Gaussian splatting, that
achieves high rendering quality with only 4 input images. We first introduce
techniques of visual hull and floater elimination which explicitly inject
structure priors into the initial optimization process for helping build
multi-view consistency, yielding a coarse 3D Gaussian representation. Then we
construct a Gaussian repair model based on diffusion models to supplement the
omitted object information, where Gaussians are further refined. We design a
self-generating strategy to obtain image pairs for training the repair model.
Our GaussianObject is evaluated on several challenging datasets, including
MipNeRF360, OmniObject3D, and OpenIllumination, achieving strong reconstruction
results from only 4 views and significantly outperforming previous
state-of-the-art methods.Comment: Project page: https://gaussianobject.github.io