73 research outputs found
NeRFCodec: Neural Feature Compression Meets Neural Radiance Fields for Memory-Efficient Scene Representation
The emergence of Neural Radiance Fields (NeRF) has greatly impacted 3D scene
modeling and novel-view synthesis. As a kind of visual media for 3D scene
representation, compression with high rate-distortion performance is an eternal
target. Motivated by advances in neural compression and neural field
representation, we propose NeRFCodec, an end-to-end NeRF compression framework
that integrates non-linear transform, quantization, and entropy coding for
memory-efficient scene representation. Since training a non-linear transform
directly on a large scale of NeRF feature planes is impractical, we discover
that pre-trained neural 2D image codec can be utilized for compressing the
features when adding content-specific parameters. Specifically, we reuse neural
2D image codec but modify its encoder and decoder heads, while keeping the
other parts of the pre-trained decoder frozen. This allows us to train the full
pipeline via supervision of rendering loss and entropy loss, yielding the
rate-distortion balance by updating the content-specific parameters. At test
time, the bitstreams containing latent code, feature decoder head, and other
side information are transmitted for communication. Experimental results
demonstrate our method outperforms existing NeRF compression methods, enabling
high-quality novel view synthesis with a memory budget of 0.5 MB.Comment: Accepted at CVPR2024. The source code will be release
VeRi3D: Generative Vertex-based Radiance Fields for 3D Controllable Human Image Synthesis
Unsupervised learning of 3D-aware generative adversarial networks has lately
made much progress. Some recent work demonstrates promising results of learning
human generative models using neural articulated radiance fields, yet their
generalization ability and controllability lag behind parametric human models,
i.e., they do not perform well when generalizing to novel pose/shape and are
not part controllable. To solve these problems, we propose VeRi3D, a generative
human vertex-based radiance field parameterized by vertices of the parametric
human template, SMPL. We map each 3D point to the local coordinate system
defined on its neighboring vertices, and use the corresponding vertex feature
and local coordinates for mapping it to color and density values. We
demonstrate that our simple approach allows for generating photorealistic human
images with free control over camera pose, human pose, shape, as well as
enabling part-level editing
Deep Generative Models on 3D Representations: A Survey
Generative models, as an important family of statistical modeling, target
learning the observed data distribution via generating new instances. Along
with the rise of neural networks, deep generative models, such as variational
autoencoders (VAEs) and generative adversarial network (GANs), have made
tremendous progress in 2D image synthesis. Recently, researchers switch their
attentions from the 2D space to the 3D space considering that 3D data better
aligns with our physical world and hence enjoys great potential in practice.
However, unlike a 2D image, which owns an efficient representation (i.e., pixel
grid) by nature, representing 3D data could face far more challenges.
Concretely, we would expect an ideal 3D representation to be capable enough to
model shapes and appearances in details, and to be highly efficient so as to
model high-resolution data with fast speed and low memory cost. However,
existing 3D representations, such as point clouds, meshes, and recent neural
fields, usually fail to meet the above requirements simultaneously. In this
survey, we make a thorough review of the development of 3D generation,
including 3D shape generation and 3D-aware image synthesis, from the
perspectives of both algorithms and more importantly representations. We hope
that our discussion could help the community track the evolution of this field
and further spark some innovative ideas to advance this challenging task
Learning Interpretable BEV Based VIO without Deep Neural Networks
Monocular visual-inertial odometry (VIO) is a critical problem in robotics
and autonomous driving. Traditional methods solve this problem based on
filtering or optimization. While being fully interpretable, they rely on manual
interference and empirical parameter tuning. On the other hand, learning-based
approaches allow for end-to-end training but require a large number of training
data to learn millions of parameters. However, the non-interpretable and heavy
models hinder the generalization ability. In this paper, we propose a fully
differentiable, and interpretable, bird-eye-view (BEV) based VIO model for
robots with local planar motion that can be trained without deep neural
networks. Specifically, we first adopt Unscented Kalman Filter as a
differentiable layer to predict the pitch and roll, where the covariance
matrices of noise are learned to filter out the noise of the IMU raw data.
Second, the refined pitch and roll are adopted to retrieve a gravity-aligned
BEV image of each frame using differentiable camera projection. Finally, a
differentiable pose estimator is utilized to estimate the remaining 3 DoF poses
between the BEV frames: leading to a 5 DoF pose estimation. Our method allows
for learning the covariance matrices end-to-end supervised by the pose
estimation loss, demonstrating superior performance to empirical baselines.
Experimental results on synthetic and real-world datasets demonstrate that our
simple approach is competitive with state-of-the-art methods and generalizes
well on unseen scenes
DORec: Decomposed Object Reconstruction Utilizing 2D Self-Supervised Features
Decomposing a target object from a complex background while reconstructing is
challenging. Most approaches acquire the perception for object instances
through the use of manual labels, but the annotation procedure is costly. The
recent advancements in 2D self-supervised learning have brought new prospects
to object-aware representation, yet it remains unclear how to leverage such
noisy 2D features for clean decomposition. In this paper, we propose a
Decomposed Object Reconstruction (DORec) network based on neural implicit
representations. Our key idea is to transfer 2D self-supervised features into
masks of two levels of granularity to supervise the decomposition, including a
binary mask to indicate the foreground regions and a K-cluster mask to indicate
the semantically similar regions. These two masks are complementary to each
other and lead to robust decomposition. Experimental results show the
superiority of DORec in segmenting and reconstructing the foreground object on
various datasets
- …