359 research outputs found

    Stereo Vision System Module for Low-Cost FPGAs for Autonomous Mobile Robots

    Get PDF
    Stereo vision uses two adjacent cameras to create a 3D image of the world. A depth map can be created by comparing the offset of the corresponding pixels from the two cameras. However, for real-time stereo vision, the image data needs to be processed at a reasonable frame rate. Real-time stereo vision allows for mobile robots to more easily navigate terrain and interact with objects by providing both the images from the cameras and the depth of the objects. Fortunately, the image processing can be parallelized in order to increase the processing speed. Field-programmable gate arrays (FPGAs) are highly parallelizable and lend themselves well to this problem. This thesis presents a stereo vision module which uses the Sum of Absolute Differences (SAD) algorithm. The SAD algorithm uses regions of pixels called windows to compare pixels to find matching pairs for determining depth. Two implementations are presented that utilize the SAD algorithm differently. The first implementation uses a 9x9 window for comparison and is able to process 4 pixels simultaneously. The second implementation uses a 7x7 window and processes 2 pixels simultaneously, but parallelizes each SAD algorithm for faster processing. The 9x9 implementation creates a better depth image with less noise, but the 7x7 implementation processes images at a higher frame rate. It has been shown through simulation that the 9x9 and 7x7 are able to process an image size of 640x480 at a frame rate of 15.73 and 29.32, respectively

    Real-time FPGA implementation of the Semi-Global Matching stereo vision algorithm for a 4K/UHD video stream

    Full text link
    In this paper, we propose a real-time FPGA implementation of the Semi-Global Matching (SGM) stereo vision algorithm. The designed module supports a 4K/Ultra HD (3840 x 2160 pixels @ 30 frames per second) video stream in a 4 pixel per clock (ppc) format and a 64-pixel disparity range. The baseline SGM implementation had to be modified to process pixels in the 4ppc format and meet the timing constrains, however, our version provides results comparable to the original design. The solution has been positively evaluated on the Xilinx VC707 development board with a Virtex-7 FPGA device.Comment: Paper accepted for the DASIP 2023 workshop in conjunction with HiPEAC 202

    ReS2tAC -- UAV-Borne Real-Time SGM Stereo Optimized for Embedded ARM and CUDA Devices

    Full text link
    With the emergence of low-cost robotic systems, such as unmanned aerial vehicle, the importance of embedded high-performance image processing has increased. For a long time, FPGAs were the only processing hardware that were capable of high-performance computing, while at the same time preserving a low power consumption, essential for embedded systems. However, the recently increasing availability of embedded GPU-based systems, such as the NVIDIA Jetson series, comprised of an ARM CPU and a NVIDIA Tegra GPU, allows for massively parallel embedded computing on graphics hardware. With this in mind, we propose an approach for real-time embedded stereo processing on ARM and CUDA-enabled devices, which is based on the popular and widely used Semi-Global Matching algorithm. In this, we propose an optimization of the algorithm for embedded CUDA GPUs, by using massively parallel computing, as well as using the NEON intrinsics to optimize the algorithm for vectorized SIMD processing on embedded ARM CPUs. We have evaluated our approach with different configurations on two public stereo benchmark datasets to demonstrate that they can reach an error rate as low as 3.3%. Furthermore, our experiments show that the fastest configuration of our approach reaches up to 46 FPS on VGA image resolution. Finally, in a use-case specific qualitative evaluation, we have evaluated the power consumption of our approach and deployed it on the DJI Manifold 2-G attached to a DJI Matrix 210v2 RTK unmanned aerial vehicle (UAV), demonstrating its suitability for real-time stereo processing onboard a UAV

    ReS²tAC—UAV-borne real-time SGM stereo optimized for embedded ARM and CUDA devices

    Get PDF
    With the emergence of low-cost robotic systems, such as unmanned aerial vehicle, the importance of embedded high-performance image processing has increased. For a long time, FPGAs were the only processing hardware that were capable of high-performance computing, while at the same time preserving a low power consumption, essential for embedded systems. However, the recently increasing availability of embedded GPU-based systems, such as the NVIDIA Jetson series, comprised of an ARM CPU and a NVIDIA Tegra GPU, allows for massively parallel embedded computing on graphics hardware. With this in mind, we propose an approach for real-time embedded stereo processing on ARM and CUDA-enabled devices, which is based on the popular and widely used Semi-Global Matching algorithm. In this, we propose an optimization of the algorithm for embedded CUDA GPUs, by using massively parallel computing, as well as using the NEON intrinsics to optimize the algorithm for vectorized SIMD processing on embedded ARM CPUs. We have evaluated our approach with different configurations on two public stereo benchmark datasets to demonstrate that they can reach an error rate as low as 3.3%. Furthermore, our experiments show that the fastest configuration of our approach reaches up to 46 FPS on VGA image resolution. Finally, in a use-case specific qualitative evaluation, we have evaluated the power consumption of our approach and deployed it on the DJI Manifold 2-G attached to a DJI Matrix 210v2 RTK unmanned aerial vehicle (UAV), demonstrating its suitability for real-time stereo processing onboard a UAV

    FPGA-based multi-view stereo system with flexible measurement setup

    Get PDF
    In recent years, stereoscopic image processing algorithms have gained importance for a variety of applications. To capture larger measurement volumes, multiple stereo systems are combined into a multi-view stereo (MVS) system. To reduce the amount of data and the data rate, calculation steps close to the sensors are outsourced to Field Programmable Gate Arrays (FPGAs) as upstream computing units. The calculation steps include lens distortion correction, rectification and stereo matching. In this paper a FPGA-based MVS system with flexible camera arrangement and partly overlapping field of view is presented. The system consists of four FPGA-based passive stereoscopic systems (Xilinx Zynq-7000 7020 SoC, EV76C570 CMOS sensor) and a downstream processing unit (Zynq Ultrascale ZU9EG SoC). This synchronizes the sensor near processing modules and receives the disparity maps with corresponding left camera image via HDMI. The subsequent computing unit calculates a coherent 3D point cloud. Our developed FPGA-based 3D measurement system captures a large measurement volume at 24 fps by combining a multiple view with eight cameras (using Semi-Global Matching for an image size of 640 px × 460 px, up to 256 px disparity range and with aggregated costs over 4 directions). The capabilities and limitation of the system are shown by an application example with optical non-cooperative surface

    An Architecture for High-throughput and Improved-quality Stereo Vision Processor

    Get PDF
    This paper presents the VLSI architecture to achieve high-throughput and improved-quality stereo vision for real applications. The stereo vision processor generates gray-scale output images with depth information from input images taken by two CMOS Image Sensors (CIS). The depth estimator using the sum of absolute differences (SAD) algorithm as stereo matching technique is implemented on hardware by exploiting pipelining and parallelism. To produce depth maps with improved-quality at real-time, pre- and post-processing units are adopted, and to enhance the adaptability of the system to real environments, special function registers (SFRs) are assigned to vision parameters. The design using 0.18um standard CMOS technology can operate at 120MHz clock, achieving over 140 frames/sec depth maps with 320 by 240 image size and 64 disparity levels. Experimental results based on images taken in real world and the Middlebury data set will be presented. Comparison data with existing hardware systems and hardware specifications of the proposed processor will be given

    MINIMIZATION OF RESOURCE UTILIZATION FOR A REAL-TIME DEPTH-MAP COMPUTATIONAL MODULE ON FPGA

    Get PDF
    Depth-map algorithm allows camera system to estimate depth in many applications. The algorithm is computationally intensive and therefore more effective to be implemented on hardware such as the Field Programmable Gate Array (FPGA). However, the recurring issue in FPGA implementation is the resource limitation. The issue is normally resolved by modifying the algorithm. However, the issue can also be addressed by implementing hardware architectures without the need to modify the depth-map algorithm. In this thesis, five different depth-map processor architectures for the sum-of-absolute-difference (SAD) depth-map algorithm on FPGA at real-time were designed and implemented. Two resource minimization techniques were employed to address the resource limitation issues. Resource usage and performance of these architectures were compared. Memory contention and bandwidth constrain were resolved by using self-initiative memory controller, FIFOs and line buffers. Parallel processing was utilized to achieve high processing speed at low clock frequency. Memory-based line buffers were used instead of register-based line buffers to save 62.4% of logic element (LEs) used, but require some additional dedicated memory bits. A proper use of registers to replace repetitive subtractors saves 24.75% of LEs. The system achieves SAD performance of 295 mega pixel disparity per second (MPDS) for the architecture with 640x480 pixel image, 3x3 pixel window size, 32 pixel disparity range and 30 frames per second. The system achieves SAD performance of 590 MPDS for the 64 pixels disparity range architecture. The disparity matching module works at the frequency of 10 MHz and produces one pixel of result every clock cycle. The results are dense disparity images, suitable for high speed, low cost, low power applications
    • …
    corecore