179,345 research outputs found
VMesh: Hybrid Volume-Mesh Representation for Efficient View Synthesis
With the emergence of neural radiance fields (NeRFs), view synthesis quality
has reached an unprecedented level. Compared to traditional mesh-based assets,
this volumetric representation is more powerful in expressing scene geometry
but inevitably suffers from high rendering costs and can hardly be involved in
further processes like editing, posing significant difficulties in combination
with the existing graphics pipeline. In this paper, we present a hybrid
volume-mesh representation, VMesh, which depicts an object with a textured mesh
along with an auxiliary sparse volume. VMesh retains the advantages of
mesh-based assets, such as efficient rendering, compact storage, and easy
editing, while also incorporating the ability to represent subtle geometric
structures provided by the volumetric counterpart. VMesh can be obtained from
multi-view images of an object and renders at 2K 60FPS on common consumer
devices with high fidelity, unleashing new opportunities for real-time
immersive applications.Comment: Project page: https://bennyguo.github.io/vmesh
WaveNeRF: Wavelet-based Generalizable Neural Radiance Fields
Neural Radiance Field (NeRF) has shown impressive performance in novel view
synthesis via implicit scene representation. However, it usually suffers from
poor scalability as requiring densely sampled images for each new scene.
Several studies have attempted to mitigate this problem by integrating
Multi-View Stereo (MVS) technique into NeRF while they still entail a
cumbersome fine-tuning process for new scenes. Notably, the rendering quality
will drop severely without this fine-tuning process and the errors mainly
appear around the high-frequency features. In the light of this observation, we
design WaveNeRF, which integrates wavelet frequency decomposition into MVS and
NeRF to achieve generalizable yet high-quality synthesis without any per-scene
optimization. To preserve high-frequency information when generating 3D feature
volumes, WaveNeRF builds Multi-View Stereo in the Wavelet domain by integrating
the discrete wavelet transform into the classical cascade MVS, which
disentangles high-frequency information explicitly. With that, disentangled
frequency features can be injected into classic NeRF via a novel hybrid neural
renderer to yield faithful high-frequency details, and an intuitive
frequency-guided sampling strategy can be designed to suppress artifacts around
high-frequency regions. Extensive experiments over three widely studied
benchmarks show that WaveNeRF achieves superior generalizable radiance field
modeling when only given three images as input.Comment: Accepted to ICCV 2023. Project website:
https://mxuai.github.io/WaveNeRF
Indoor Scene Reconstruction with Fine-Grained Details Using Hybrid Representation and Normal Prior Enhancement
The reconstruction of indoor scenes from multi-view RGB images is challenging
due to the coexistence of flat and texture-less regions alongside delicate and
fine-grained regions. Recent methods leverage neural radiance fields aided by
predicted surface normal priors to recover the scene geometry. These methods
excel in producing complete and smooth results for floor and wall areas.
However, they struggle to capture complex surfaces with high-frequency
structures due to the inadequate neural representation and the inaccurately
predicted normal priors. To improve the capacity of the implicit
representation, we propose a hybrid architecture to represent low-frequency and
high-frequency regions separately. To enhance the normal priors, we introduce a
simple yet effective image sharpening and denoising technique, coupled with a
network that estimates the pixel-wise uncertainty of the predicted surface
normal vectors. Identifying such uncertainty can prevent our model from being
misled by unreliable surface normal supervisions that hinder the accurate
reconstruction of intricate geometries. Experiments on the benchmark datasets
show that our method significantly outperforms existing methods in terms of
reconstruction quality
Long-term experiments with an adaptive spherical view representation for navigation in changing environments
Real-world environments such as houses and offices change over time, meaning that a mobile robot’s map will become out of date. In this work, we introduce a method to update the reference views in a hybrid metric-topological map so that a mobile robot can continue to localize itself in a changing environment. The updating mechanism, based on the multi-store model of human memory, incorporates a spherical metric representation of the observed visual features for each node in the map, which enables the robot to estimate its heading and navigate using multi-view geometry, as well as representing the local 3D geometry of the environment. A series of experiments demonstrate the persistence performance of the proposed system in real changing environments, including analysis of the long-term stability
An adaptive spherical view representation for navigation in changing environments
Real-world environments such as houses and offices change over time, meaning that a mobile robot’s map will become out of date. In previous work we introduced a method to update the reference views in a topological map so that a mobile robot could continue to localize itself in a changing environment using omni-directional vision. In this work we extend this longterm updating mechanism to incorporate a spherical metric representation of the observed visual features for each node in the topological map. Using multi-view geometry we are then able to estimate the heading of the robot, in order to enable navigation between the nodes of the map, and to simultaneously adapt the spherical view representation in response to environmental changes. The results demonstrate the persistent performance of the proposed system in a long-term experiment
Learning multi-view neighborhood preserving projections
We address the problem of metric learning for multi-view data, namely the construction of embedding projections from data in different representations into a shared feature space, such that the Euclidean distance in this space provides a meaningful within-view as well as between-view similarity. Our motivation stems from the problem of cross-media retrieval tasks, where the availability of a joint Euclidean distance function is a prerequisite to allow fast, in particular hashing-based, nearest neighbor queries. We formulate an objective function that expresses the intuitive concept that matching samples are mapped closely together in the output space, whereas non-matching samples are pushed apart, no matter in which view they are available. The resulting optimization problem is not convex, but it can be decomposed explicitly into a convex and a concave part, thereby allowing efficient optimization using the convex-concave procedure. Experiments on an image retrieval task show that nearest-neighbor based cross-view retrieval is indeed possible, and the proposed technique improves the retrieval accuracy over baseline techniques
- …