349 research outputs found
Parallel and Distributed Performance of a Depth Estimation Algorithm
Expansion of dataset sizes and increasing complexity of processing algorithms have led to consideration of parallel and distributed implementations. The rationale for distributing the computational load may be to thin-provision computational resources, to accelerate data processing rate, or to efficiently reuse already available but otherwise idle computational resources. Whatever the rationale, an efficient solution of this type brings with it questions of data distribution, job partitioning, reliability, and robustness. This paper addresses the first two of these questions in the context of a local cluster-computing environment. Using the CHRT depth estimator, it considers active and passive data distribution and their effect on data throughput, focusing mainly on the compromises required to maintain minimal communications requirements between nodes. As metric, the algorithm considers the overall computation time for a given dataset (i.e., the time lag that a user would experience), and shows that although there are significant speedups to be had by relatively simple modifications to the algorithm, there are limitations to the parallelism that can be achieved efficiently, and a balance between inter-node parallelism (i.e., multiple nodes running in parallel) and intranode parallelism (i.e., multiple threads within one node) for most efficient utilization of available resources
OccCasNet: Occlusion-aware Cascade Cost Volume for Light Field Depth Estimation
Light field (LF) depth estimation is a crucial task with numerous practical
applications. However, mainstream methods based on the multi-view stereo (MVS)
are resource-intensive and time-consuming as they need to construct a finer
cost volume. To address this issue and achieve a better trade-off between
accuracy and efficiency, we propose an occlusion-aware cascade cost volume for
LF depth (disparity) estimation. Our cascaded strategy reduces the sampling
number while keeping the sampling interval constant during the construction of
a finer cost volume. We also introduce occlusion maps to enhance accuracy in
constructing the occlusion-aware cost volume. Specifically, we first obtain the
coarse disparity map through the coarse disparity estimation network. Then, the
sub-aperture images (SAIs) of side views are warped to the center view based on
the initial disparity map. Next, we propose photo-consistency constraints
between the warped SAIs and the center SAI to generate occlusion maps for each
SAI. Finally, we introduce the coarse disparity map and occlusion maps to
construct an occlusion-aware refined cost volume, enabling the refined
disparity estimation network to yield a more precise disparity map. Extensive
experiments demonstrate the effectiveness of our method. Compared with
state-of-the-art methods, our method achieves a superior balance between
accuracy and efficiency and ranks first in terms of MSE and Q25 metrics among
published methods on the HCI 4D benchmark. The code and model of the proposed
method are available at https://github.com/chaowentao/OccCasNet
An Explicit Method for Fast Monocular Depth Recovery in Corridor Environments
Monocular cameras are extensively employed in indoor robotics, but their
performance is limited in visual odometry, depth estimation, and related
applications due to the absence of scale information.Depth estimation refers to
the process of estimating a dense depth map from the corresponding input image,
existing researchers mostly address this issue through deep learning-based
approaches, yet their inference speed is slow, leading to poor real-time
capabilities. To tackle this challenge, we propose an explicit method for rapid
monocular depth recovery specifically designed for corridor environments,
leveraging the principles of nonlinear optimization. We adopt the virtual
camera assumption to make full use of the prior geometric features of the
scene. The depth estimation problem is transformed into an optimization problem
by minimizing the geometric residual. Furthermore, a novel depth plane
construction technique is introduced to categorize spatial points based on
their possible depths, facilitating swift depth estimation in enclosed
structural scenarios, such as corridors. We also propose a new corridor
dataset, named Corr\_EH\_z, which contains images as captured by the UGV camera
of a variety of corridors. An exhaustive set of experiments in different
corridors reveal the efficacy of the proposed algorithm.Comment: 10 pages, 8 figures. arXiv admin note: text overlap with
arXiv:2111.08600 by other author
SAAM: Stealthy Adversarial Attack on Monocular Depth Estimation
In this paper, we investigate the vulnerability of MDE to adversarial
patches. We propose a novel \underline{S}tealthy \underline{A}dversarial
\underline{A}ttacks on \underline{M}DE (SAAM) that compromises MDE by either
corrupting the estimated distance or causing an object to seamlessly blend into
its surroundings. Our experiments, demonstrate that the designed stealthy patch
successfully causes a DNN-based MDE to misestimate the depth of objects. In
fact, our proposed adversarial patch achieves a significant 60\% depth error
with 99\% ratio of the affected region. Importantly, despite its adversarial
nature, the patch maintains a naturalistic appearance, making it inconspicuous
to human observers. We believe that this work sheds light on the threat of
adversarial attacks in the context of MDE on edge devices. We hope it raises
awareness within the community about the potential real-life harm of such
attacks and encourages further research into developing more robust and
adaptive defense mechanisms
Lightweight Monocular Depth Estimation via Token-Sharing Transformer
Depth estimation is an important task in various robotics systems and
applications. In mobile robotics systems, monocular depth estimation is
desirable since a single RGB camera can be deployable at a low cost and compact
size. Due to its significant and growing needs, many lightweight monocular
depth estimation networks have been proposed for mobile robotics systems. While
most lightweight monocular depth estimation methods have been developed using
convolution neural networks, the Transformer has been gradually utilized in
monocular depth estimation recently. However, massive parameters and large
computational costs in the Transformer disturb the deployment to embedded
devices. In this paper, we present a Token-Sharing Transformer (TST), an
architecture using the Transformer for monocular depth estimation, optimized
especially in embedded devices. The proposed TST utilizes global token sharing,
which enables the model to obtain an accurate depth prediction with high
throughput in embedded devices. Experimental results show that TST outperforms
the existing lightweight monocular depth estimation methods. On the NYU Depth
v2 dataset, TST can deliver depth maps up to 63.4 FPS in NVIDIA Jetson nano and
142.6 FPS in NVIDIA Jetson TX2, with lower errors than the existing methods.
Furthermore, TST achieves real-time depth estimation of high-resolution images
on Jetson TX2 with competitive results.Comment: ICRA 202
Light Field Depth Estimation Based on Stitched-EPI
Depth estimation is one of the most essential problems for light field
applications. In EPI-based methods, the slope computation usually suffers low
accuracy due to the discretization error and low angular resolution. In
addition, recent methods work well in most regions but often struggle with
blurry edges over occluded regions and ambiguity over texture-less regions. To
address these challenging issues, we first propose the stitched-EPI and
half-stitched-EPI algorithms for non-occluded and occluded regions,
respectively. The algorithms improve slope computation by shifting and
concatenating lines in different EPIs but related to the same point in 3D
scene, while the half-stitched-EPI only uses non-occluded part of lines.
Combined with the joint photo-consistency cost proposed by us, the more
accurate and robust depth map can be obtained in both occluded and non-occluded
regions. Furthermore, to improve the depth estimation in texture-less regions,
we propose a depth propagation strategy that determines their depth from the
edge to interior, from accurate regions to coarse regions. Experimental and
ablation results demonstrate that the proposed method achieves accurate and
robust depth maps in all regions effectively.Comment: 15 page
Neural Contourlet Network for Monocular 360 Depth Estimation
For a monocular 360 image, depth estimation is a challenging because the
distortion increases along the latitude. To perceive the distortion, existing
methods devote to designing a deep and complex network architecture. In this
paper, we provide a new perspective that constructs an interpretable and sparse
representation for a 360 image. Considering the importance of the geometric
structure in depth estimation, we utilize the contourlet transform to capture
an explicit geometric cue in the spectral domain and integrate it with an
implicit cue in the spatial domain. Specifically, we propose a neural
contourlet network consisting of a convolutional neural network and a
contourlet transform branch. In the encoder stage, we design a spatial-spectral
fusion module to effectively fuse two types of cues. Contrary to the encoder,
we employ the inverse contourlet transform with learned low-pass subbands and
band-pass directional subbands to compose the depth in the decoder. Experiments
on the three popular panoramic image datasets demonstrate that the proposed
approach outperforms the state-of-the-art schemes with faster convergence. Code
is available at
https://github.com/zhijieshen-bjtu/Neural-Contourlet-Network-for-MODE.Comment: IEEE Transactions on Circuits and Systems for Video Technolog
Adversarial Attacks on Monocular Pose Estimation
Advances in deep learning have resulted in steady progress in computer vision
with improved accuracy on tasks such as object detection and semantic
segmentation. Nevertheless, deep neural networks are vulnerable to adversarial
attacks, thus presenting a challenge in reliable deployment. Two of the
prominent tasks in 3D scene-understanding for robotics and advanced drive
assistance systems are monocular depth and pose estimation, often learned
together in an unsupervised manner. While studies evaluating the impact of
adversarial attacks on monocular depth estimation exist, a systematic
demonstration and analysis of adversarial perturbations against pose estimation
are lacking. We show how additive imperceptible perturbations can not only
change predictions to increase the trajectory drift but also catastrophically
alter its geometry. We also study the relation between adversarial
perturbations targeting monocular depth and pose estimation networks, as well
as the transferability of perturbations to other networks with different
architectures and losses. Our experiments show how the generated perturbations
lead to notable errors in relative rotation and translation predictions and
elucidate vulnerabilities of the networks.Comment: Accepted at the 2022 IEEE/RSJ International Conference on Intelligent
Robots and Systems (IROS 2022
ADU-Depth: Attention-based Distillation with Uncertainty Modeling for Depth Estimation
Monocular depth estimation is challenging due to its inherent ambiguity and
ill-posed nature, yet it is quite important to many applications. While recent
works achieve limited accuracy by designing increasingly complicated networks
to extract features with limited spatial geometric cues from a single RGB
image, we intend to introduce spatial cues by training a teacher network that
leverages left-right image pairs as inputs and transferring the learned 3D
geometry-aware knowledge to the monocular student network. Specifically, we
present a novel knowledge distillation framework, named ADU-Depth, with the
goal of leveraging the well-trained teacher network to guide the learning of
the student network, thus boosting the precise depth estimation with the help
of extra spatial scene information. To enable domain adaptation and ensure
effective and smooth knowledge transfer from teacher to student, we apply both
attention-adapted feature distillation and focal-depth-adapted response
distillation in the training stage. In addition, we explicitly model the
uncertainty of depth estimation to guide distillation in both feature space and
result space to better produce 3D-aware knowledge from monocular observations
and thus enhance the learning for hard-to-predict image regions. Our extensive
experiments on the real depth estimation datasets KITTI and DrivingStereo
demonstrate the effectiveness of the proposed method, which ranked 1st on the
challenging KITTI online benchmark.Comment: accepted by CoRL 202
Learning based Deep Disentangling Light Field Reconstruction and Disparity Estimation Application
Light field cameras have a wide range of uses due to their ability to
simultaneously record light intensity and direction. The angular resolution of
light fields is important for downstream tasks such as depth estimation, yet is
often difficult to improve due to hardware limitations. Conventional methods
tend to perform poorly against the challenge of large disparity in sparse light
fields, while general CNNs have difficulty extracting spatial and angular
features coupled together in 4D light fields. The light field disentangling
mechanism transforms the 4D light field into 2D image format, which is more
favorable for CNN for feature extraction. In this paper, we propose a Deep
Disentangling Mechanism, which inherits the principle of the light field
disentangling mechanism and further develops the design of the feature
extractor and adds advanced network structure. We design a light-field
reconstruction network (i.e., DDASR) on the basis of the Deep Disentangling
Mechanism, and achieve SOTA performance in the experiments. In addition, we
design a Block Traversal Angular Super-Resolution Strategy for the practical
application of depth estimation enhancement where the input views is often
higher than 2x2 in the experiments resulting in a high memory usage, which can
reduce the memory usage while having a better reconstruction performance
- …