59 research outputs found
Real-Time Dense Stereo Matching With ELAS on FPGA Accelerated Embedded Devices
For many applications in low-power real-time robotics, stereo cameras are the
sensors of choice for depth perception as they are typically cheaper and more
versatile than their active counterparts. Their biggest drawback, however, is
that they do not directly sense depth maps; instead, these must be estimated
through data-intensive processes. Therefore, appropriate algorithm selection
plays an important role in achieving the desired performance characteristics.
Motivated by applications in space and mobile robotics, we implement and
evaluate a FPGA-accelerated adaptation of the ELAS algorithm. Despite offering
one of the best trade-offs between efficiency and accuracy, ELAS has only been
shown to run at 1.5-3 fps on a high-end CPU. Our system preserves all
intriguing properties of the original algorithm, such as the slanted plane
priors, but can achieve a frame rate of 47fps whilst consuming under 4W of
power. Unlike previous FPGA based designs, we take advantage of both components
on the CPU/FPGA System-on-Chip to showcase the strategy necessary to accelerate
more complex and computationally diverse algorithms for such low power,
real-time systems.Comment: 8 pages, 7 figures, 2 table
RSGM: Real-time Raster-Respecting Semi-Global Matching for Power-Constrained Systems
Stereo depth estimation is used for many computer vision applications. Though
many popular methods strive solely for depth quality, for real-time mobile
applications (e.g. prosthetic glasses or micro-UAVs), speed and power
efficiency are equally, if not more, important. Many real-world systems rely on
Semi-Global Matching (SGM) to achieve a good accuracy vs. speed balance, but
power efficiency is hard to achieve with conventional hardware, making the use
of embedded devices such as FPGAs attractive for low-power applications.
However, the full SGM algorithm is ill-suited to deployment on FPGAs, and so
most FPGA variants of it are partial, at the expense of accuracy. In a non-FPGA
context, the accuracy of SGM has been improved by More Global Matching (MGM),
which also helps tackle the streaking artifacts that afflict SGM. In this
paper, we propose a novel, resource-efficient method that is inspired by MGM's
techniques for improving depth quality, but which can be implemented to run in
real time on a low-power FPGA. Through evaluation on multiple datasets (KITTI
and Middlebury), we show that in comparison to other real-time capable stereo
approaches, we can achieve a state-of-the-art balance between accuracy, power
efficiency and speed, making our approach highly desirable for use in real-time
systems with limited power.Comment: Accepted in FPT 2018 as Oral presentation, 8 pages, 6 figures, 4
table
FPGA-based multi-view stereo system with flexible measurement setup
In recent years, stereoscopic image processing algorithms have gained importance for a variety of applications. To capture larger measurement volumes, multiple stereo systems are combined into a multi-view stereo (MVS) system. To reduce the amount of data and the data rate, calculation steps close to the sensors are outsourced to Field Programmable Gate Arrays (FPGAs) as upstream computing units. The calculation steps include lens distortion correction, rectification and stereo matching. In this paper a FPGA-based MVS system with flexible camera arrangement and partly overlapping field of view is presented. The system consists of four FPGA-based passive stereoscopic systems (Xilinx Zynq-7000 7020 SoC, EV76C570 CMOS sensor) and a downstream processing unit (Zynq Ultrascale ZU9EG SoC). This synchronizes the sensor near processing modules and receives the disparity maps with corresponding left camera image via HDMI. The subsequent computing unit calculates a coherent 3D point cloud. Our developed FPGA-based 3D measurement system captures a large measurement volume at 24 fps by combining a multiple view with eight cameras (using Semi-Global Matching for an image size of 640 px × 460 px, up to 256 px disparity range and with aggregated costs over 4 directions). The capabilities and limitation of the system are shown by an application example with optical non-cooperative surface
ReS2tAC -- UAV-Borne Real-Time SGM Stereo Optimized for Embedded ARM and CUDA Devices
With the emergence of low-cost robotic systems, such as unmanned aerial
vehicle, the importance of embedded high-performance image processing has
increased. For a long time, FPGAs were the only processing hardware that were
capable of high-performance computing, while at the same time preserving a low
power consumption, essential for embedded systems. However, the recently
increasing availability of embedded GPU-based systems, such as the NVIDIA
Jetson series, comprised of an ARM CPU and a NVIDIA Tegra GPU, allows for
massively parallel embedded computing on graphics hardware. With this in mind,
we propose an approach for real-time embedded stereo processing on ARM and
CUDA-enabled devices, which is based on the popular and widely used Semi-Global
Matching algorithm. In this, we propose an optimization of the algorithm for
embedded CUDA GPUs, by using massively parallel computing, as well as using the
NEON intrinsics to optimize the algorithm for vectorized SIMD processing on
embedded ARM CPUs. We have evaluated our approach with different configurations
on two public stereo benchmark datasets to demonstrate that they can reach an
error rate as low as 3.3%. Furthermore, our experiments show that the fastest
configuration of our approach reaches up to 46 FPS on VGA image resolution.
Finally, in a use-case specific qualitative evaluation, we have evaluated the
power consumption of our approach and deployed it on the DJI Manifold 2-G
attached to a DJI Matrix 210v2 RTK unmanned aerial vehicle (UAV), demonstrating
its suitability for real-time stereo processing onboard a UAV
ReS²tAC—UAV-borne real-time SGM stereo optimized for embedded ARM and CUDA devices
With the emergence of low-cost robotic systems, such as unmanned aerial vehicle, the importance of embedded high-performance image processing has increased. For a long time, FPGAs were the only processing hardware that were capable of high-performance computing, while at the same time preserving a low power consumption, essential for embedded systems. However, the recently increasing availability of embedded GPU-based systems, such as the NVIDIA Jetson series, comprised of an ARM CPU and a NVIDIA Tegra GPU, allows for massively parallel embedded computing on graphics hardware. With this in mind, we propose an approach for real-time embedded stereo processing on ARM and CUDA-enabled devices, which is based on the popular and widely used Semi-Global Matching algorithm. In this, we propose an optimization of the algorithm for embedded CUDA GPUs, by using massively parallel computing, as well as using the NEON intrinsics to optimize the algorithm for vectorized SIMD processing on embedded ARM CPUs. We have evaluated our approach with different configurations on two public stereo benchmark datasets to demonstrate that they can reach an error rate as low as 3.3%. Furthermore, our experiments show that the fastest configuration of our approach reaches up to 46 FPS on VGA image resolution. Finally, in a use-case specific qualitative evaluation, we have evaluated the power consumption of our approach and deployed it on the DJI Manifold 2-G attached to a DJI Matrix 210v2 RTK unmanned aerial vehicle (UAV), demonstrating its suitability for real-time stereo processing onboard a UAV
Quality estimation and optimization of adaptive stereo matching algorithms for smart vehicles
Stereo matching is a promising approach for smart vehicles to find the depth of nearby objects. Transforming a traditional stereo matching algorithm to its adaptive version has potential advantages to achieve the maximum quality (depth accuracy) in a best-effort manner. However, it is very challenging to support this adaptive feature, since (1) the internal mechanism of adaptive stereo matching (ASM) has to be accurately modeled, and (2) scheduling ASM tasks on multiprocessors to generate the maximum quality is difficult under strict real-time constraints of smart vehicles. In this article, we propose a framework for constructing an ASM application and optimizing its output quality on smart vehicles. First, we empirically convert stereo matching into ASM by exploiting its inherent characteristics of disparity–cycle correspondence and introduce an exponential quality model that accurately represents the quality–cycle relationship. Second, with the explicit quality model, we propose an efficient quadratic programming-based dynamic voltage/frequency scaling (DVFS) algorithm to decide the optimal operating strategy, which maximizes the output quality under timing, energy, and temperature constraints. Third, we propose two novel methods to efficiently estimate the parameters of the quality model, namely location similarity-based feature point thresholding and street scenario-confined CNN prediction. Results show that our DVFS algorithm achieves at least 1.61 times quality improvement compared to the state-of-the-art techniques, and average parameter estimation for the quality model achieves 96.35% accuracy on the straight road
- …