231 research outputs found
Reducing Shape-Radiance Ambiguity in Radiance Fields with a Closed-Form Color Estimation Method
Neural radiance field (NeRF) enables the synthesis of cutting-edge realistic
novel view images of a 3D scene. It includes density and color fields to model
the shape and radiance of a scene, respectively. Supervised by the photometric
loss in an end-to-end training manner, NeRF inherently suffers from the
shape-radiance ambiguity problem, i.e., it can perfectly fit training views but
does not guarantee decoupling the two fields correctly. To deal with this
issue, existing works have incorporated prior knowledge to provide an
independent supervision signal for the density field, including total variation
loss, sparsity loss, distortion loss, etc. These losses are based on general
assumptions about the density field, e.g., it should be smooth, sparse, or
compact, which are not adaptive to a specific scene. In this paper, we propose
a more adaptive method to reduce the shape-radiance ambiguity. The key is a
rendering method that is only based on the density field. Specifically, we
first estimate the color field based on the density field and posed images in a
closed form. Then NeRF's rendering process can proceed. We address the problems
in estimating the color field, including occlusion and non-uniformly
distributed views. Afterward, it is applied to regularize NeRF's density field.
As our regularization is guided by photometric loss, it is more adaptive
compared to existing ones. Experimental results show that our method improves
the density field of NeRF both qualitatively and quantitatively. Our code is
available at https://github.com/qihangGH/Closed-form-color-field.Comment: This work has been published in NeurIPS 202
A New Similarity Measure between Intuitionistic Fuzzy Sets and Its Application to Pattern Recognition
As a generation of ordinary fuzzy set, the concept of intuitionistic fuzzy set (IFS), characterized both by a membership degree and by a nonmembership degree, is a more flexible way to cope with the uncertainty. Similarity measures of intuitionistic fuzzy sets are used to indicate the similarity degree between intuitionistic fuzzy sets. Although many similarity measures for intuitionistic fuzzy sets have been proposed in previous studies, some of those cannot satisfy the axioms of similarity or provide counterintuitive cases. In this paper, a new similarity measure and weighted similarity measure between IFSs are proposed. It proves that the proposed similarity measures satisfy the properties of the axiomatic definition for similarity measures. Comparison between the previous similarity measures and the proposed similarity measure indicates that the proposed similarity measure does not provide any counterintuitive cases. Moreover, it is demonstrated that the proposed similarity measure is capable of discriminating difference between patterns
Learning Local Feature Descriptor with Motion Attribute for Vision-based Localization
In recent years, camera-based localization has been widely used for robotic
applications, and most proposed algorithms rely on local features extracted
from recorded images. For better performance, the features used for open-loop
localization are required to be short-term globally static, and the ones used
for re-localization or loop closure detection need to be long-term static.
Therefore, the motion attribute of a local feature point could be exploited to
improve localization performance, e.g., the feature points extracted from
moving persons or vehicles can be excluded from these systems due to their
unsteadiness. In this paper, we design a fully convolutional network (FCN),
named MD-Net, to perform motion attribute estimation and feature description
simultaneously. MD-Net has a shared backbone network to extract features from
the input image and two network branches to complete each sub-task. With
MD-Net, we can obtain the motion attribute while avoiding increasing much more
computation. Experimental results demonstrate that the proposed method can
learn distinct local feature descriptor along with motion attribute only using
an FCN, by outperforming competing methods by a wide margin. We also show that
the proposed algorithm can be integrated into a vision-based localization
algorithm to improve estimation accuracy significantly.Comment: This paper will be presented on IROS1
Masked Space-Time Hash Encoding for Efficient Dynamic Scene Reconstruction
In this paper, we propose the Masked Space-Time Hash encoding (MSTH), a novel
method for efficiently reconstructing dynamic 3D scenes from multi-view or
monocular videos. Based on the observation that dynamic scenes often contain
substantial static areas that result in redundancy in storage and computations,
MSTH represents a dynamic scene as a weighted combination of a 3D hash encoding
and a 4D hash encoding. The weights for the two components are represented by a
learnable mask which is guided by an uncertainty-based objective to reflect the
spatial and temporal importance of each 3D position. With this design, our
method can reduce the hash collision rate by avoiding redundant queries and
modifications on static areas, making it feasible to represent a large number
of space-time voxels by hash tables with small size.Besides, without the
requirements to fit the large numbers of temporally redundant features
independently, our method is easier to optimize and converge rapidly with only
twenty minutes of training for a 300-frame dynamic scene.As a result, MSTH
obtains consistently better results than previous methods with only 20 minutes
of training time and 130 MB of memory storage. Code is available at
https://github.com/masked-spacetime-hashing/msthComment: NeurIPS 2023 (Spotlight
Mixed Neural Voxels for Fast Multi-view Video Synthesis
Synthesizing high-fidelity videos from real-world multi-view input is
challenging because of the complexities of real-world environments and highly
dynamic motions. Previous works based on neural radiance fields have
demonstrated high-quality reconstructions of dynamic scenes. However, training
such models on real-world scenes is time-consuming, usually taking days or
weeks. In this paper, we present a novel method named MixVoxels to better
represent the dynamic scenes with fast training speed and competitive rendering
qualities. The proposed MixVoxels represents the 4D dynamic scenes as a mixture
of static and dynamic voxels and processes them with different networks. In
this way, the computation of the required modalities for static voxels can be
processed by a lightweight model, which essentially reduces the amount of
computation, especially for many daily dynamic scenes dominated by the static
background. To separate the two kinds of voxels, we propose a novel variation
field to estimate the temporal variance of each voxel. For the dynamic voxels,
we design an inner-product time query method to efficiently query multiple time
steps, which is essential to recover the high-dynamic motions. As a result,
with 15 minutes of training for dynamic scenes with inputs of 300-frame videos,
MixVoxels achieves better PSNR than previous methods. Codes and trained models
are available at https://github.com/fengres/mixvoxelsComment: ICCV 2023 (Oral
- …