5,933 research outputs found
Accurate Light Field Depth Estimation with Superpixel Regularization over Partially Occluded Regions
Depth estimation is a fundamental problem for light field photography
applications. Numerous methods have been proposed in recent years, which either
focus on crafting cost terms for more robust matching, or on analyzing the
geometry of scene structures embedded in the epipolar-plane images. Significant
improvements have been made in terms of overall depth estimation error;
however, current state-of-the-art methods still show limitations in handling
intricate occluding structures and complex scenes with multiple occlusions. To
address these challenging issues, we propose a very effective depth estimation
framework which focuses on regularizing the initial label confidence map and
edge strength weights. Specifically, we first detect partially occluded
boundary regions (POBR) via superpixel based regularization. Series of
shrinkage/reinforcement operations are then applied on the label confidence map
and edge strength weights over the POBR. We show that after weight
manipulations, even a low-complexity weighted least squares model can produce
much better depth estimation than state-of-the-art methods in terms of average
disparity error rate, occlusion boundary precision-recall rate, and the
preservation of intricate visual features
OccCasNet: Occlusion-aware Cascade Cost Volume for Light Field Depth Estimation
Light field (LF) depth estimation is a crucial task with numerous practical
applications. However, mainstream methods based on the multi-view stereo (MVS)
are resource-intensive and time-consuming as they need to construct a finer
cost volume. To address this issue and achieve a better trade-off between
accuracy and efficiency, we propose an occlusion-aware cascade cost volume for
LF depth (disparity) estimation. Our cascaded strategy reduces the sampling
number while keeping the sampling interval constant during the construction of
a finer cost volume. We also introduce occlusion maps to enhance accuracy in
constructing the occlusion-aware cost volume. Specifically, we first obtain the
coarse disparity map through the coarse disparity estimation network. Then, the
sub-aperture images (SAIs) of side views are warped to the center view based on
the initial disparity map. Next, we propose photo-consistency constraints
between the warped SAIs and the center SAI to generate occlusion maps for each
SAI. Finally, we introduce the coarse disparity map and occlusion maps to
construct an occlusion-aware refined cost volume, enabling the refined
disparity estimation network to yield a more precise disparity map. Extensive
experiments demonstrate the effectiveness of our method. Compared with
state-of-the-art methods, our method achieves a superior balance between
accuracy and efficiency and ranks first in terms of MSE and Q25 metrics among
published methods on the HCI 4D benchmark. The code and model of the proposed
method are available at https://github.com/chaowentao/OccCasNet
Real-time High-resolution View Synthesis of Complex Scenes with Explicit 3D Visibility Reasoning
Rendering photo-realistic novel-view images of complex scenes has been a
long-standing challenge in computer graphics. In recent years, great research
progress has been made on enhancing rendering quality and accelerating
rendering speed in the realm of view synthesis. However, when rendering complex
dynamic scenes with sparse views, the rendering quality remains limited due to
occlusion problems. Besides, for rendering high-resolution images on dynamic
scenes, the rendering speed is still far from real-time. In this work, we
propose a generalizable view synthesis method that can render high-resolution
novel-view images of complex static and dynamic scenes in real-time from sparse
views. To address the occlusion problems arising from the sparsity of input
views and the complexity of captured scenes, we introduce an explicit 3D
visibility reasoning approach that can efficiently estimate the visibility of
sampled 3D points to the input views. The proposed visibility reasoning
approach is fully differentiable and can gracefully fit inside the volume
rendering pipeline, allowing us to train our networks with only multi-view
images as supervision while refining geometry and texture simultaneously.
Besides, each module in our pipeline is carefully designed to bypass the
time-consuming MLP querying process and enhance the rendering quality of
high-resolution images, enabling us to render high-resolution novel-view images
in real-time.Experimental results show that our method outperforms previous
view synthesis methods in both rendering quality and speed, particularly when
dealing with complex dynamic scenes with sparse views
Light field reconstruction from multi-view images
Kang Han studied recovering the 3D world from multi-view images. He proposed several algorithms to deal with occlusions in depth estimation and effective representations in view rendering. the proposed algorithms can be used for many innovative applications based on machine intelligence, such as autonomous driving and Metaverse
Reducing Shape-Radiance Ambiguity in Radiance Fields with a Closed-Form Color Estimation Method
Neural radiance field (NeRF) enables the synthesis of cutting-edge realistic
novel view images of a 3D scene. It includes density and color fields to model
the shape and radiance of a scene, respectively. Supervised by the photometric
loss in an end-to-end training manner, NeRF inherently suffers from the
shape-radiance ambiguity problem, i.e., it can perfectly fit training views but
does not guarantee decoupling the two fields correctly. To deal with this
issue, existing works have incorporated prior knowledge to provide an
independent supervision signal for the density field, including total variation
loss, sparsity loss, distortion loss, etc. These losses are based on general
assumptions about the density field, e.g., it should be smooth, sparse, or
compact, which are not adaptive to a specific scene. In this paper, we propose
a more adaptive method to reduce the shape-radiance ambiguity. The key is a
rendering method that is only based on the density field. Specifically, we
first estimate the color field based on the density field and posed images in a
closed form. Then NeRF's rendering process can proceed. We address the problems
in estimating the color field, including occlusion and non-uniformly
distributed views. Afterward, it is applied to regularize NeRF's density field.
As our regularization is guided by photometric loss, it is more adaptive
compared to existing ones. Experimental results show that our method improves
the density field of NeRF both qualitatively and quantitatively. Our code is
available at https://github.com/qihangGH/Closed-form-color-field.Comment: This work has been published in NeurIPS 202
DINER: Depth-aware Image-based NEural Radiance fields
We present Depth-aware Image-based NEural Radiance fields (DINER). Given a
sparse set of RGB input views, we predict depth and feature maps to guide the
reconstruction of a volumetric scene representation that allows us to render 3D
objects under novel views. Specifically, we propose novel techniques to
incorporate depth information into feature fusion and efficient scene sampling.
In comparison to the previous state of the art, DINER achieves higher synthesis
quality and can process input views with greater disparity. This allows us to
capture scenes more completely without changing capturing hardware requirements
and ultimately enables larger viewpoint changes during novel view synthesis. We
evaluate our method by synthesizing novel views, both for human heads and for
general objects, and observe significantly improved qualitative results and
increased perceptual metrics compared to the previous state of the art. The
code is publicly available for research purposes.Comment: Website: https://malteprinzler.github.io/projects/diner/diner.html ;
Video: https://www.youtube.com/watch?v=iI_fpjY5k8Y&t=1
- …