2,236 research outputs found

    Time-Shared Execution of Realtime Computer Vision Pipelines by Dynamic Partial Reconfiguration

    Full text link
    This paper presents an FPGA runtime framework that demonstrates the feasibility of using dynamic partial reconfiguration (DPR) for time-sharing an FPGA by multiple realtime computer vision pipelines. The presented time-sharing runtime framework manages an FPGA fabric that can be round-robin time-shared by different pipelines at the time scale of individual frames. In this new use-case, the challenge is to achieve useful performance despite high reconfiguration time. The paper describes the basic runtime support as well as four optimizations necessary to achieve realtime performance given the limitations of DPR on today's FPGAs. The paper provides a characterization of a working runtime framework prototype on a Xilinx ZC706 development board. The paper also reports the performance of realtime computer vision pipelines when time-shared

    Performance evaluation over HW/SW co-design SoC memory transfers for a CNN accelerator

    Get PDF
    Many FPGAs vendors have recently included embedded processors in their devices, like Xilinx with ARM-Cortex A cores, together with programmable logic cells. These devices are known as Programmable System on Chip (PSoC). Their ARM cores (embedded in the processing system or PS) communicates with the programmable logic cells (PL) using ARM-standard AXI buses. In this paper we analyses the performance of exhaustive data transfers between PS and PL for a Xilinx Zynq FPGA in a co-design real scenario for Convolutional Neural Networks (CNN) accelerator, which processes, in dedicated hardware, a stream of visual information from a neuromorphic visual sensor for classification. In the PS side, a Linux operating system is running, which recollects visual events from the neuromorphic sensor into a normalized frame, and then it transfers these frames to the accelerator of multi-layered CNNs, and read results, using an AXI-DMA bus in a per-layer way. As these kind of accelerators try to process information as quick as possible, data bandwidth becomes critical and maintaining a good balanced data throughput rate requires some considerations. We present and evaluate several data partitioning techniques to improve the balance between RX and TX transfer and two different ways of transfers management: through a polling routine at the userlevel of the OS, and through a dedicated interrupt-based kernellevel driver. We demonstrate that for longer enough packets, the kernel-level driver solution gets better timing in computing a CNN classification example. Main advantage of using kernel-level driver is to have safer solutions and to have tasks scheduling in the OS to manage other important processes for our application, like frames collection from sensors and their normalization.Ministerio de Economía y Competitividad TEC2016-77785-

    HW/SW Co-design and Prototyping Approach for Embedded Smart Camera: ADAS Case Study

    Get PDF
    In 1968, Volkswagen integrated an electronic circuit as a new control fuel injection system, called the “Little Black Box”, it is considered as the first embedded system in the automotive industry. Currently, automobile constructors integrate several embedded systems into any of their new model vehicles. Behind these automobile’s electronics systems, a sophisticated Hardware/Software (HW/SW) architecture, which is based on heterogeneous components, and multiple CPUs is built. At present, they are more oriented toward visionbased systems using tiny embedded smart camera. This visionbased system in real time aspects represents one of the most challenging issues, especially in the domain of automobile’s applications. On the design side, one of the optimal solutions adopted by embedded systems designer for system performance, is to associate CPUs and hardware accelerators in the same design, in order to reduce the computational burden on the CPU and to speed-up the data processing. In this paper, we present a hardware platform-based design approach for fast embedded smart Advanced Driver Assistant System (ADAS) design and prototyping, as an alternative for the pure time-consuming simulation technique. Based on a Multi-CPU/FPGA platform, we introduced a new methodology/flow to design the different HW and SW parts of the ADAS system. Then, we shared our experience in designing and prototyping a HW/SW vision based on smart embedded system as an ADAS that helps to increase the safety of car’s drivers. We presented a real HW/SW prototype of the vision ADAS based on a Zynq FPGA. The system detects the fatigue/drowsiness state of the driver by monitoring the eyes closure and generates a real time alert. A new HW Skin Segmentation step to locate the eyes/face is proposed. Our new approach migrates the skin segmentation step from processing system (SW) to programmable logic (HW) taking the advantage of High-Level Synthesis (HLS) tool flow to accelerate the implementation, and the prototyping of the Vision based ADAS on a hardware platform

    Enhancing Real-time Embedded Image Processing Robustness on Reconfigurable Devices for Critical Applications

    Get PDF
    Nowadays, image processing is increasingly used in several application fields, such as biomedical, aerospace, or automotive. Within these fields, image processing is used to serve both non-critical and critical tasks. As example, in automotive, cameras are becoming key sensors in increasing car safety, driving assistance and driving comfort. They have been employed for infotainment (non-critical), as well as for some driver assistance tasks (critical), such as Forward Collision Avoidance, Intelligent Speed Control, or Pedestrian Detection. The complexity of these algorithms brings a challenge in real-time image processing systems, requiring high computing capacity, usually not available in processors for embedded systems. Hardware acceleration is therefore crucial, and devices such as Field Programmable Gate Arrays (FPGAs) best fit the growing demand of computational capabilities. These devices can assist embedded processors by significantly speeding-up computationally intensive software algorithms. Moreover, critical applications introduce strict requirements not only from the real-time constraints, but also from the device reliability and algorithm robustness points of view. Technology scaling is highlighting reliability problems related to aging phenomena, and to the increasing sensitivity of digital devices to external radiation events that can cause transient or even permanent faults. These faults can lead to wrong information processed or, in the worst case, to a dangerous system failure. In this context, the reconfigurable nature of FPGA devices can be exploited to increase the system reliability and robustness by leveraging Dynamic Partial Reconfiguration features. The research work presented in this thesis focuses on the development of techniques for implementing efficient and robust real-time embedded image processing hardware accelerators and systems for mission-critical applications. Three main challenges have been faced and will be discussed, along with proposed solutions, throughout the thesis: (i) achieving real-time performances, (ii) enhancing algorithm robustness, and (iii) increasing overall system's dependability. In order to ensure real-time performances, efficient FPGA-based hardware accelerators implementing selected image processing algorithms have been developed. Functionalities offered by the target technology, and algorithm's characteristics have been constantly taken into account while designing such accelerators, in order to efficiently tailor algorithm's operations to available hardware resources. On the other hand, the key idea for increasing image processing algorithms' robustness is to introduce self-adaptivity features at algorithm level, in order to maintain constant, or improve, the quality of results for a wide range of input conditions, that are not always fully predictable at design-time (e.g., noise level variations). This has been accomplished by measuring at run-time some characteristics of the input images, and then tuning the algorithm parameters based on such estimations. Dynamic reconfiguration features of modern reconfigurable FPGA have been extensively exploited in order to integrate run-time adaptivity into the designed hardware accelerators. Tools and methodologies have been also developed in order to increase the overall system dependability during reconfiguration processes, thus providing safe run-time adaptation mechanisms. In addition, taking into account the target technology and the environments in which the developed hardware accelerators and systems may be employed, dependability issues have been analyzed, leading to the development of a platform for quickly assessing the reliability and characterizing the behavior of hardware accelerators implemented on reconfigurable FPGAs when they are affected by such faults

    Context-aware adaptation in DySCAS

    Get PDF
    DySCAS is a dynamically self-configuring middleware for automotive control systems. The addition of autonomic, context-aware dynamic configuration to automotive control systems brings a potential for a wide range of benefits in terms of robustness, flexibility, upgrading etc. However, the automotive systems represent a particularly challenging domain for the deployment of autonomics concepts, having a combination of real-time performance constraints, severe resource limitations, safety-critical aspects and cost pressures. For these reasons current systems are statically configured. This paper describes the dynamic run-time configuration aspects of DySCAS and focuses on the extent to which context-aware adaptation has been achieved in DySCAS, and the ways in which the various design and implementation challenges are met

    Energy challenges for ICT

    Get PDF
    The energy consumption from the expanding use of information and communications technology (ICT) is unsustainable with present drivers, and it will impact heavily on the future climate change. However, ICT devices have the potential to contribute signi - cantly to the reduction of CO2 emission and enhance resource e ciency in other sectors, e.g., transportation (through intelligent transportation and advanced driver assistance systems and self-driving vehicles), heating (through smart building control), and manu- facturing (through digital automation based on smart autonomous sensors). To address the energy sustainability of ICT and capture the full potential of ICT in resource e - ciency, a multidisciplinary ICT-energy community needs to be brought together cover- ing devices, microarchitectures, ultra large-scale integration (ULSI), high-performance computing (HPC), energy harvesting, energy storage, system design, embedded sys- tems, e cient electronics, static analysis, and computation. In this chapter, we introduce challenges and opportunities in this emerging eld and a common framework to strive towards energy-sustainable ICT

    Real-Time Vision System for License Plate Detection and Recognition on FPGA

    Get PDF
    Rapid development of the Field Programmable Gate Array (FPGA) offers an alternative way to provide acceleration for computationally intensive tasks such as digital signal and image processing. Its ability to perform parallel processing shows the potential in implementing a high speed vision system. Out of numerous applications of computer vision, this paper focuses on the hardware implementation of one that is commercially known as Automatic Number Plate Recognition (ANPR).Morphological operations and Optical Character Recognition (OCR) algorithms have been implemented on a Xilinx Zynq-7000 All-Programmable SoC to realize the functions of an ANPR system. Test results have shown that the designed and implemented processing pipeline that consumed 63 % of the logic resources is capable of delivering the results with relatively low error rate. Most importantly, the computation time satisfies the real-time requirement for many ANPR applications
    corecore