11,388 research outputs found
Indoor Depth Completion with Boundary Consistency and Self-Attention
Depth estimation features are helpful for 3D recognition. Commodity-grade
depth cameras are able to capture depth and color image in real-time. However,
glossy, transparent or distant surface cannot be scanned properly by the
sensor. As a result, enhancement and restoration from sensing depth is an
important task. Depth completion aims at filling the holes that sensors fail to
detect, which is still a complex task for machine to learn. Traditional
hand-tuned methods have reached their limits, while neural network based
methods tend to copy and interpolate the output from surrounding depth values.
This leads to blurred boundaries, and structures of the depth map are lost.
Consequently, our main work is to design an end-to-end network improving
completion depth maps while maintaining edge clarity. We utilize self-attention
mechanism, previously used in image inpainting fields, to extract more useful
information in each layer of convolution so that the complete depth map is
enhanced. In addition, we propose boundary consistency concept to enhance the
depth map quality and structure. Experimental results validate the
effectiveness of our self-attention and boundary consistency schema, which
outperforms previous state-of-the-art depth completion work on Matterport3D
dataset. Our code is publicly available at
https://github.com/patrickwu2/Depth-CompletionComment: Accepted by ICCVW (RLQ) 201
RDFC-GAN: RGB-Depth Fusion CycleGAN for Indoor Depth Completion
The raw depth image captured by indoor depth sensors usually has an extensive
range of missing depth values due to inherent limitations such as the inability
to perceive transparent objects and the limited distance range. The incomplete
depth map with missing values burdens many downstream vision tasks, and a
rising number of depth completion methods have been proposed to alleviate this
issue. While most existing methods can generate accurate dense depth maps from
sparse and uniformly sampled depth maps, they are not suitable for
complementing large contiguous regions of missing depth values, which is common
and critical in images captured in indoor environments. To overcome these
challenges, we design a novel two-branch end-to-end fusion network named
RDFC-GAN, which takes a pair of RGB and incomplete depth images as input to
predict a dense and completed depth map. The first branch employs an
encoder-decoder structure, by adhering to the Manhattan world assumption and
utilizing normal maps from RGB-D information as guidance, to regress the local
dense depth values from the raw depth map. In the other branch, we propose an
RGB-depth fusion CycleGAN to transfer the RGB image to the fine-grained
textured depth map. We adopt adaptive fusion modules named W-AdaIN to propagate
the features across the two branches, and we append a confidence fusion head to
fuse the two outputs of the branches for the final depth map. Extensive
experiments on NYU-Depth V2 and SUN RGB-D demonstrate that our proposed method
clearly improves the depth completion performance, especially in a more
realistic setting of indoor environments, with the help of our proposed pseudo
depth maps in training.Comment: Haowen Wang and Zhengping Che are with equal contributions. Under
review. An earlier version has been accepted by CVPR 2022 (arXiv:2203.10856
The Surprising Effectiveness of Diffusion Models for Optical Flow and Monocular Depth Estimation
Denoising diffusion probabilistic models have transformed image generation
with their impressive fidelity and diversity. We show that they also excel in
estimating optical flow and monocular depth, surprisingly, without
task-specific architectures and loss functions that are predominant for these
tasks. Compared to the point estimates of conventional regression-based
methods, diffusion models also enable Monte Carlo inference, e.g., capturing
uncertainty and ambiguity in flow and depth. With self-supervised pre-training,
the combined use of synthetic and real data for supervised training, and
technical innovations (infilling and step-unrolled denoising diffusion
training) to handle noisy-incomplete training data, and a simple form of
coarse-to-fine refinement, one can train state-of-the-art diffusion models for
depth and optical flow estimation. Extensive experiments focus on quantitative
performance against benchmarks, ablations, and the model's ability to capture
uncertainty and multimodality, and impute missing values. Our model, DDVM
(Denoising Diffusion Vision Model), obtains a state-of-the-art relative depth
error of 0.074 on the indoor NYU benchmark and an Fl-all outlier rate of 3.26\%
on the KITTI optical flow benchmark, about 25\% better than the best published
method. For an overview see https://diffusion-vision.github.io
AGG-Net: Attention Guided Gated-convolutional Network for Depth Image Completion
Recently, stereo vision based on lightweight RGBD cameras has been widely
used in various fields. However, limited by the imaging principles, the
commonly used RGB-D cameras based on TOF, structured light, or binocular vision
acquire some invalid data inevitably, such as weak reflection, boundary
shadows, and artifacts, which may bring adverse impacts to the follow-up work.
In this paper, we propose a new model for depth image completion based on the
Attention Guided Gated-convolutional Network (AGG-Net), through which more
accurate and reliable depth images can be obtained from the raw depth maps and
the corresponding RGB images. Our model employs a UNet-like architecture which
consists of two parallel branches of depth and color features. In the encoding
stage, an Attention Guided Gated-Convolution (AG-GConv) module is proposed to
realize the fusion of depth and color features at different scales, which can
effectively reduce the negative impacts of invalid depth data on the
reconstruction. In the decoding stage, an Attention Guided Skip Connection
(AG-SC) module is presented to avoid introducing too many depth-irrelevant
features to the reconstruction. The experimental results demonstrate that our
method outperforms the state-of-the-art methods on the popular benchmarks
NYU-Depth V2, DIML, and SUN RGB-D.Comment: 9 pages, 7 figures, ICCV202
- …