5,036 research outputs found

    Efficient Real-Time Architectures and FPGA Implementations of Histogram-Based Median Filters for High Definition Videos

    Get PDF
    Digital filtering plays an important role in many signal processing applications. Filtering is performed to recover the original signal from its corrupted version. Median filter is a non-linear digital filter that replaces a sample in a given window by the median value of the samples in the window. For images corrupted with impulse noise, median filter provides a very high quality of filtered images. Several modifications of median filters have been proposed and implemented to achieve high image quality compared to that provided by conventional median filters. When these filters are implemented on hardware platforms such as FPGAs, the performance parameters, namely, the area, power and operating frequency should be taken into consideration in addition to the quality of the filtered image. Therefore, efficient implementation of median filters on FPGAs for image and video processing algorithms has been a topic of much interest. The existing hardware-based median filters for high definition video formats do not always satisfy the real-time throughput requirements or are inefficient with respect to hardware performance parameters, such as the area and frequency. This is due to the fact that most of the existing techniques use sorting-based median calculation, which results in a low hardware performance. In this thesis, architectures that use histogram-based median computation, which is a non-sorting-based operation, are designed with a view of efficient hardware implementation. This is carried out in two parts. We design and implement efficient architectures that satisfy the real-time throughput requirements of full high definition (FHD) videos in the first part and that of ultra high definition (UHD) videos in the second part. In the first part, an efficient real-time histogram-based median filter that uses the concept of bit-plane-slicing and adaptive switching median filter (ASMF) is designed and implemented on FPGAs. We term this architecture as hybrid architecture for median filtering (HAMF). The proposed HAMF computes an approximate median, since it uses only the most significant B-bits of the pixel values for median calculation. As a result, the algorithmic level implementation of the proposed HAMF results in a slight degradation in the filtered image quality compared to that provided by ASMF. The proposed HAMF provides a significant improvement over ASMF in terms of the area and operating frequency, when implemented on different generation FPGAs. Analysis of the different parameters, such as the number of bit-planes used in the computation of the median and the number of pipelining stages, is carried out to study the trade-off between the quality of the filtered image and hardware performance. Although the FPGA implementation of the proposed HAMF provides a very high operating frequency, the quality of the images filtered by its algorithmic level implementation decreases with increasing window size and noise density. This filter may be suitable for applications that require FHD filtering with cost constraints, but not for applications where the output image quality is as important as the hardware performance. Hence, in the second part, we design an efficient and real-time architecture of the hierarchical histogram-based median filter (HHMF). The proposed architecture is designed using a full synchronous pipeline, a synchronous accumulate-and-compare unit, and is scalable. The FPGA implementation of the proposed architecture of HHMF can perform real-time filtering of 4K and 8K UHD videos. The quality of the image filtered by HHMF is not compromised as in the case of HAMF, since HHMF uses all the bit-planes and computes the actual median. Although the FPGA implementation of HHMF results in more area utilization, the proposed implementation is more economical than a GPU-based HHMF implementation and provides a better throughput

    Biblioteca para diseño basado en modelos de algoritmos de procesado de imágenes en FPGA

    Get PDF
    This paper describes a library (XSGImgLib) that includes parameterizable blocks to implement low-level image processing tasks on FPGAs. A modelbased design technique provided by Xilinx System Generator (XSG) has been used to design the blocks, which implement point operation (binarization) and neighborhood operations (linear and non-linear filtering) in grayscale images. The blocks are parameterizable for input/output data precision, window size, normalization strategy, and implementation options (area versus speed optimization). The paper includes the implementation results obtained after fixing these options and exemplifies the combination of several blocks of the library to build a complete design for image segmentation purposes.Este artículo describe una biblioteca de bloques parametrizables (XSGImgLib) para la implementación de tareas de procesado de imágenes en FPGA. Se ha utilizado la técnica de diseño basado en modelos proporcionada por Xilinx System Generator (XSG) para diseñar diferentes bloques de procesado que implementan operaciones puntuales (binarización) y basadas en vecindad (filtros lineales y no-lineales) para imágenes en escala de grises. La parametrización de los bloques permite configurar la precisión de los datos de entrada/salida, el tamaño de la ventana, la estrategia de normalización y distintas opciones de implementación (optimización en área o velocidad). El artículo muestra los resultados de implementación para las diferentes opciones de configuración y ejemplifica la combinación de los bloques de procesado en el desarrollo de un sistema para segmentado de imágenes.Agencia Española de Cooperación Internacional para el Desarrollo PCID/024124/09, PCID/030769/1

    An area-efficient 2-D convolution implementation on FPGA for space applications

    Get PDF
    The 2-D Convolution is an algorithm widely used in image and video processing. Although its computation is simple, its implementation requires a high computational power and an intensive use of memory. Field Programmable Gate Arrays (FPGA) architectures were proposed to accelerate calculations of 2-D Convolution and the use of buffers implemented on FPGAs are used to avoid direct memory access. In this paper we present an implementation of the 2-D Convolution algorithm on a FPGA architecture designed to support this operation in space applications. This proposed solution dramatically decreases the area needed keeping good performance, making it appropriate for embedded systems in critical space application

    Real-time on-board obstacle avoidance for UAVs based on embedded stereo vision

    Get PDF
    In order to improve usability and safety, modern unmanned aerial vehicles (UAVs) are equipped with sensors to monitor the environment, such as laser-scanners and cameras. One important aspect in this monitoring process is to detect obstacles in the flight path in order to avoid collisions. Since a large number of consumer UAVs suffer from tight weight and power constraints, our work focuses on obstacle avoidance based on a lightweight stereo camera setup. We use disparity maps, which are computed from the camera images, to locate obstacles and to automatically steer the UAV around them. For disparity map computation we optimize the well-known semi-global matching (SGM) approach for the deployment on an embedded FPGA. The disparity maps are then converted into simpler representations, the so called U-/V-Maps, which are used for obstacle detection. Obstacle avoidance is based on a reactive approach which finds the shortest path around the obstacles as soon as they have a critical distance to the UAV. One of the fundamental goals of our work was the reduction of development costs by closing the gap between application development and hardware optimization. Hence, we aimed at using high-level synthesis (HLS) for porting our algorithms, which are written in C/C++, to the embedded FPGA. We evaluated our implementation of the disparity estimation on the KITTI Stereo 2015 benchmark. The integrity of the overall realtime reactive obstacle avoidance algorithm has been evaluated by using Hardware-in-the-Loop testing in conjunction with two flight simulators.Comment: Accepted in the International Archives of the Photogrammetry, Remote Sensing and Spatial Information Scienc

    Multi-Level Pre-Correlation RFI Flagging for Real-Time Implementation on UniBoard

    Get PDF
    Because of the denser active use of the spectrum, and because of radio telescopes higher sensitivity, radio frequency interference (RFI) mitigation has become a sensitive topic for current and future radio telescope designs. Even if quite sophisticated approaches have been proposed in the recent years, the majority of RFI mitigation operational procedures are based on post-correlation corrupted data flagging. Moreover, given the huge amount of data delivered by current and next generation radio telescopes, all these RFI detection procedures have to be at least automatic and, if possible, real-time. In this paper, the implementation of a real-time pre-correlation RFI detection and flagging procedure into generic high-performance computing platforms based on Field Programmable Gate Arrays (FPGA) is described, simulated and tested. One of these boards, UniBoard, developed under a Joint Research Activity in the RadioNet FP7 European programme is based on eight FPGAs interconnected by a high speed transceiver mesh. It provides up to ~4 TMACs with Altera Stratix IV FPGA and 160 Gbps data rate for the input data stream. Considering the high in-out data rate in the pre-correlation stages, only real-time and go-through detectors (i.e. no iterative processing) can be implemented. In this paper, a real-time and adaptive detection scheme is described. An ongoing case study has been set up with the Electronic Multi-Beam Radio Astronomy Concept (EMBRACE) radio telescope facility at Nan\c{c}ay Observatory. The objective is to evaluate the performances of this concept in term of hardware complexity, detection efficiency and additional RFI metadata rate cost. The UniBoard implementation scheme is described.Comment: 16 pages, 13 figure

    Real-Time Dense Stereo Matching With ELAS on FPGA Accelerated Embedded Devices

    Full text link
    For many applications in low-power real-time robotics, stereo cameras are the sensors of choice for depth perception as they are typically cheaper and more versatile than their active counterparts. Their biggest drawback, however, is that they do not directly sense depth maps; instead, these must be estimated through data-intensive processes. Therefore, appropriate algorithm selection plays an important role in achieving the desired performance characteristics. Motivated by applications in space and mobile robotics, we implement and evaluate a FPGA-accelerated adaptation of the ELAS algorithm. Despite offering one of the best trade-offs between efficiency and accuracy, ELAS has only been shown to run at 1.5-3 fps on a high-end CPU. Our system preserves all intriguing properties of the original algorithm, such as the slanted plane priors, but can achieve a frame rate of 47fps whilst consuming under 4W of power. Unlike previous FPGA based designs, we take advantage of both components on the CPU/FPGA System-on-Chip to showcase the strategy necessary to accelerate more complex and computationally diverse algorithms for such low power, real-time systems.Comment: 8 pages, 7 figures, 2 table

    Evaluation of Robust Deep Learning Pipelines Targeting Low SWaP Edge Deployment

    Get PDF
    The deep learning technique of convolutional neural networks (CNNs) has greatly advanced the state-of-the-art for computer vision tasks such as image classification and object detection. These solutions rely on large systems leveraging wattage-hungry GPUs to provide the computational power to achieve such performance. However, the size, weight and power (SWaP) requirements of these conventional GPU-based deep learning systems are not suitable when a solution requires deployment to so called Edge environments such as autonomous vehicles, unmanned aerial vehicles (UAVs) and smart security cameras. The objective of this work is to benchmark FPGA-based alternatives to conventional GPU systems that have the potential to offer similar CNN inference performance while being delivered in a low SWaP platform suitable for Edge deployment. In this thesis we create equivalent pipelines for both GPU and FPGA which implement deep learning models for both image classification and object detection tasks. Beyond baseline benchmarking, we additionally quantify the impact on inference performance of two common real-world image degradation scenarios (simulated contrast reduced capture and salt-and-pepper sensor noise) and their associated correction methods (gamma correction and median kernel filtering) we selected as illustrative examples. The baseline system analysis, coupled with these additional robustness evaluations, provides a statistically significant benchmark comparison targeting a breadth of interest for the computer vision community. We have conducted the following experiments to demonstrate the FPGA as an effective alternative to the GPU implementation when deployed to Edge environments: (1) we developed a hardware video processing architecture with an associated library of hardware processing functions to prototype a base FPGA ecosystem, (2) we established through benchmarking that two common CNN models (ResNet-50 and YOLO version 3) have a mere 1\% drop in performance on FPGA versus GPU, (3) we show a quantitative baseline analysis for the image degradation/correction on the associated testing datasets, and (4) we proved that our FPGA-based computer vision system is an ideal platform for Edge deployment given its comparable robustness to input degradation when optimal correction is applied. The significance of these findings is the demonstration of our FPGA-based solution as the superior candidate for Edge deployed vision systems evidenced by our experiments which illustrate its competitive inference performance to the conventional GPU solution and its equivalent robustness provided by correction methods to noise encountered during in-the-wild imaging while being delivered with far lower SWaP requirements

    SELF-ADAPTING PARALLEL FRAMEWORK FOR LONG-TERM OBJECT TRACKING

    Get PDF
    Object tracking is a crucial field in computer vision that has many uses in human-computer interaction, security and surveillance, video communication and compression, augmented reality, traffic control, etc. Many implementations are introduced in practice, and yet recent methods emphasize on tracking objects adaptively by learning the object’s perspectives and rediscovering it when it becomes untraceable, so that object’s absence problem (in case of occlusion, cluttering or blurring) is resolved. Most of these algorithms have high computational burden on the computational units and need powerful CPUs to attain real-time tracking and high bitrate video processing. These computational units may handle no more than a single video source, making it unsuitable for large-scale implementations like multiple sources or higher resolution videos. In this thesis, we choose one popular algorithm called TLD, Tracking-Learning-Detection, study the core components of the algorithm that impede its performance, and implement these components in a parallel computational environment such as multi-core CPUs, GPUs, etc., also known as heterogeneous computing. OpenCL is used as a development platform to produce parallel kernels for the algorithm. The goals are to create an acceptable heterogeneous computing environment through utilizing current computer technologies, to imbue real-time applications with an alternative implementation methodology, and to circumvent the upcoming limitations of hardware in terms of cost, power, and speedup. We are able to bring true parallel speedup to the existing implementations, which greatly improves the frame rate for long-term object tracking and with some algorithm parameter modification, it provides more accurate object tracking. According to the experiments, developed kernels have achieved a range of performance improvement. As for reduction based kernels, a maximum of 78X speedup is achieved. While for window based kernels, a range of couple hundreds to 2000X speedup is achieved. And for the optical flow tracking kernel, a maximum of 5.7X speedup is recorded. Global speedup is highly dependent on the hardware specifications, especially for memory transfers. With the use of a medium sized input, the self-adapting parallel framework has successfully obtained a fast learning curve and converged to an average of 1.6X speedup compared to the original implementation. Lastly, for future programming convenience, an OpenCL based library is built to facilitate the use of OpenCL programming on parallel hardware devices, hide the complexity of building and compiling OpenCL kernels, and provide a C-based latency measurement tool that is compatible with several operating systems
    corecore