2,765 research outputs found

    Efficient Hardware Architectures for Accelerating Deep Neural Networks: Survey

    Get PDF
    In the modern-day era of technology, a paradigm shift has been witnessed in the areas involving applications of Artificial Intelligence (AI), Machine Learning (ML), and Deep Learning (DL). Specifically, Deep Neural Networks (DNNs) have emerged as a popular field of interest in most AI applications such as computer vision, image and video processing, robotics, etc. In the context of developed digital technologies and the availability of authentic data and data handling infrastructure, DNNs have been a credible choice for solving more complex real-life problems. The performance and accuracy of a DNN is a way better than human intelligence in certain situations. However, it is noteworthy that the DNN is computationally too cumbersome in terms of the resources and time to handle these computations. Furthermore, general-purpose architectures like CPUs have issues in handling such computationally intensive algorithms. Therefore, a lot of interest and efforts have been invested by the research fraternity in specialized hardware architectures such as Graphics Processing Unit (GPU), Field Programmable Gate Array (FPGA), Application Specific Integrated Circuit (ASIC), and Coarse Grained Reconfigurable Array (CGRA) in the context of effective implementation of computationally intensive algorithms. This paper brings forward the various research works carried out on the development and deployment of DNNs using the aforementioned specialized hardware architectures and embedded AI accelerators. The review discusses the detailed description of the specialized hardware-based accelerators used in the training and/or inference of DNN. A comparative study based on factors like power, area, and throughput, is also made on the various accelerators discussed. Finally, future research and development directions are discussed, such as future trends in DNN implementation on specialized hardware accelerators. This review article is intended to serve as a guide for hardware architectures for accelerating and improving the effectiveness of deep learning research.publishedVersio

    A study of FPGA-based System-on-Chip designs for real-time industrial application

    Get PDF
    This paper shows the benefits of the Field Programming Gate Array (FPGAs) in industrial control applications. The author starts by addressing the benefits of FPGA and where it is useful. As well as, the author has done some FPGA’s evaluation researches on the FPGA performing explaining the performance of the FPGA and the design tools. To show the benefits of the FPGA, an industrial application example has been used. The application is a real-time face detection and tracking using FPGA. Face tracking will depend on calculating the centroid of each detected region. A DE2-SoC Altera board has been used to implement this application. The application based on few algorithms that filter the captured images to detect them. These algorithms have been translated to a Verilog code to run it on the DE2-SoC boar

    An IoT Endpoint System-on-Chip for Secure and Energy-Efficient Near-Sensor Analytics

    Full text link
    Near-sensor data analytics is a promising direction for IoT endpoints, as it minimizes energy spent on communication and reduces network load - but it also poses security concerns, as valuable data is stored or sent over the network at various stages of the analytics pipeline. Using encryption to protect sensitive data at the boundary of the on-chip analytics engine is a way to address data security issues. To cope with the combined workload of analytics and encryption in a tight power envelope, we propose Fulmine, a System-on-Chip based on a tightly-coupled multi-core cluster augmented with specialized blocks for compute-intensive data processing and encryption functions, supporting software programmability for regular computing tasks. The Fulmine SoC, fabricated in 65nm technology, consumes less than 20mW on average at 0.8V achieving an efficiency of up to 70pJ/B in encryption, 50pJ/px in convolution, or up to 25MIPS/mW in software. As a strong argument for real-life flexible application of our platform, we show experimental results for three secure analytics use cases: secure autonomous aerial surveillance with a state-of-the-art deep CNN consuming 3.16pJ per equivalent RISC op; local CNN-based face detection with secured remote recognition in 5.74pJ/op; and seizure detection with encrypted data collection from EEG within 12.7pJ/op.Comment: 15 pages, 12 figures, accepted for publication to the IEEE Transactions on Circuits and Systems - I: Regular Paper

    An Efficient Classification of Hyperspectral Remotely Sensed Data Using Support Vector Machine

    Get PDF
    This work present an efficient hardware architecture of Support Vector Machine (SVM) for the classification of Hyperspectral remotely sensed data using High Level Synthesis (HLS) method. The high classification time and power consumption in traditional classification of remotely sensed data is the main motivation for this work. Therefore presented work helps to classify the remotely sensed data in real-time and to take immediate action during the natural disaster. An embedded based SVM is designed and implemented on Zynq SoC for classification of hyperspectral images. The data set of remotely sensed data are tested on different platforms and the performance is compared with existing works. Novelty in our proposed work is extend the HLS based FPGA implantation to the onboard classification system in remote sensing. The experimental results for selected data set from different class shows that our architecture on Zynq 7000 implementation generates a delay of 11.26 µs and power consumption of 1.7 Watts, which is extremely better as compared to other Field Programmable Gate Array (FPGA) implementation using Hardware description Language (HDL)  and Central Processing Unit (CPU) implementation
    • …
    corecore