1,367 research outputs found
Depth from Monocular Images using a Semi-Parallel Deep Neural Network (SPDNN) Hybrid Architecture
Deep neural networks are applied to a wide range of problems in recent years.
In this work, Convolutional Neural Network (CNN) is applied to the problem of
determining the depth from a single camera image (monocular depth). Eight
different networks are designed to perform depth estimation, each of them
suitable for a feature level. Networks with different pooling sizes determine
different feature levels. After designing a set of networks, these models may
be combined into a single network topology using graph optimization techniques.
This "Semi Parallel Deep Neural Network (SPDNN)" eliminates duplicated common
network layers, and can be further optimized by retraining to achieve an
improved model compared to the individual topologies. In this study, four SPDNN
models are trained and have been evaluated at 2 stages on the KITTI dataset.
The ground truth images in the first part of the experiment are provided by the
benchmark, and for the second part, the ground truth images are the depth map
results from applying a state-of-the-art stereo matching method. The results of
this evaluation demonstrate that using post-processing techniques to refine the
target of the network increases the accuracy of depth estimation on individual
mono images. The second evaluation shows that using segmentation data alongside
the original data as the input can improve the depth estimation results to a
point where performance is comparable with stereo depth estimation. The
computational time is also discussed in this study.Comment: 44 pages, 25 figure
RSGM: Real-time Raster-Respecting Semi-Global Matching for Power-Constrained Systems
Stereo depth estimation is used for many computer vision applications. Though
many popular methods strive solely for depth quality, for real-time mobile
applications (e.g. prosthetic glasses or micro-UAVs), speed and power
efficiency are equally, if not more, important. Many real-world systems rely on
Semi-Global Matching (SGM) to achieve a good accuracy vs. speed balance, but
power efficiency is hard to achieve with conventional hardware, making the use
of embedded devices such as FPGAs attractive for low-power applications.
However, the full SGM algorithm is ill-suited to deployment on FPGAs, and so
most FPGA variants of it are partial, at the expense of accuracy. In a non-FPGA
context, the accuracy of SGM has been improved by More Global Matching (MGM),
which also helps tackle the streaking artifacts that afflict SGM. In this
paper, we propose a novel, resource-efficient method that is inspired by MGM's
techniques for improving depth quality, but which can be implemented to run in
real time on a low-power FPGA. Through evaluation on multiple datasets (KITTI
and Middlebury), we show that in comparison to other real-time capable stereo
approaches, we can achieve a state-of-the-art balance between accuracy, power
efficiency and speed, making our approach highly desirable for use in real-time
systems with limited power.Comment: Accepted in FPT 2018 as Oral presentation, 8 pages, 6 figures, 4
table
Real-time on-board obstacle avoidance for UAVs based on embedded stereo vision
In order to improve usability and safety, modern unmanned aerial vehicles
(UAVs) are equipped with sensors to monitor the environment, such as
laser-scanners and cameras. One important aspect in this monitoring process is
to detect obstacles in the flight path in order to avoid collisions. Since a
large number of consumer UAVs suffer from tight weight and power constraints,
our work focuses on obstacle avoidance based on a lightweight stereo camera
setup. We use disparity maps, which are computed from the camera images, to
locate obstacles and to automatically steer the UAV around them. For disparity
map computation we optimize the well-known semi-global matching (SGM) approach
for the deployment on an embedded FPGA. The disparity maps are then converted
into simpler representations, the so called U-/V-Maps, which are used for
obstacle detection. Obstacle avoidance is based on a reactive approach which
finds the shortest path around the obstacles as soon as they have a critical
distance to the UAV. One of the fundamental goals of our work was the reduction
of development costs by closing the gap between application development and
hardware optimization. Hence, we aimed at using high-level synthesis (HLS) for
porting our algorithms, which are written in C/C++, to the embedded FPGA. We
evaluated our implementation of the disparity estimation on the KITTI Stereo
2015 benchmark. The integrity of the overall realtime reactive obstacle
avoidance algorithm has been evaluated by using Hardware-in-the-Loop testing in
conjunction with two flight simulators.Comment: Accepted in the International Archives of the Photogrammetry, Remote
Sensing and Spatial Information Scienc
TIDE: Temporally Incremental Disparity Estimation via Pattern Flow in Structured Light System
We introduced Temporally Incremental Disparity Estimation Network (TIDE-Net),
a learning-based technique for disparity computation in mono-camera structured
light systems. In our hardware setting, a static pattern is projected onto a
dynamic scene and captured by a monocular camera. Different from most former
disparity estimation methods that operate in a frame-wise manner, our network
acquires disparity maps in a temporally incremental way. Specifically, We
exploit the deformation of projected patterns (named pattern flow ) on captured
image sequences, to model the temporal information. Notably, this newly
proposed pattern flow formulation reflects the disparity changes along the
epipolar line, which is a special form of optical flow. Tailored for pattern
flow, the TIDE-Net, a recurrent architecture, is proposed and implemented. For
each incoming frame, our model fuses correlation volumes (from current frame)
and disparity (from former frame) warped by pattern flow. From fused features,
the final stage of TIDE-Net estimates the residual disparity rather than the
full disparity, as conducted by many previous methods. Interestingly, this
design brings clear empirical advantages in terms of efficiency and
generalization ability. Using only synthetic data for training, our extensitve
evaluation results (w.r.t. both accuracy and efficienty metrics) show superior
performance than several SOTA models on unseen real data. The code is available
on https://github.com/CodePointer/TIDENet
Guided Stereo Matching
Stereo is a prominent technique to infer dense depth maps from images, and
deep learning further pushed forward the state-of-the-art, making end-to-end
architectures unrivaled when enough data is available for training. However,
deep networks suffer from significant drops in accuracy when dealing with new
environments. Therefore, in this paper, we introduce Guided Stereo Matching, a
novel paradigm leveraging a small amount of sparse, yet reliable depth
measurements retrieved from an external source enabling to ameliorate this
weakness. The additional sparse cues required by our method can be obtained
with any strategy (e.g., a LiDAR) and used to enhance features linked to
corresponding disparity hypotheses. Our formulation is general and fully
differentiable, thus enabling to exploit the additional sparse inputs in
pre-trained deep stereo networks as well as for training a new instance from
scratch. Extensive experiments on three standard datasets and two
state-of-the-art deep architectures show that even with a small set of sparse
input cues, i) the proposed paradigm enables significant improvements to
pre-trained networks. Moreover, ii) training from scratch notably increases
accuracy and robustness to domain shifts. Finally, iii) it is suited and
effective even with traditional stereo algorithms such as SGM.Comment: CVPR 201
- …