1,690 research outputs found
Toolflows for Mapping Convolutional Neural Networks on FPGAs: A Survey and Future Directions
In the past decade, Convolutional Neural Networks (CNNs) have demonstrated
state-of-the-art performance in various Artificial Intelligence tasks. To
accelerate the experimentation and development of CNNs, several software
frameworks have been released, primarily targeting power-hungry CPUs and GPUs.
In this context, reconfigurable hardware in the form of FPGAs constitutes a
potential alternative platform that can be integrated in the existing deep
learning ecosystem to provide a tunable balance between performance, power
consumption and programmability. In this paper, a survey of the existing
CNN-to-FPGA toolflows is presented, comprising a comparative study of their key
characteristics which include the supported applications, architectural
choices, design space exploration methods and achieved performance. Moreover,
major challenges and objectives introduced by the latest trends in CNN
algorithmic research are identified and presented. Finally, a uniform
evaluation methodology is proposed, aiming at the comprehensive, complete and
in-depth evaluation of CNN-to-FPGA toolflows.Comment: Accepted for publication at the ACM Computing Surveys (CSUR) journal,
201
Efficient Hardware Architectures for Accelerating Deep Neural Networks: Survey
In the modern-day era of technology, a paradigm shift has been witnessed in the areas involving applications of Artificial Intelligence (AI), Machine Learning (ML), and Deep Learning (DL). Specifically, Deep Neural Networks (DNNs) have emerged as a popular field of interest in most AI applications such as computer vision, image and video processing, robotics, etc. In the context of developed digital technologies and the availability of authentic data and data handling infrastructure, DNNs have been a credible choice for solving more complex real-life problems. The performance and accuracy of a DNN is a way better than human intelligence in certain situations. However, it is noteworthy that the DNN is computationally too cumbersome in terms of the resources and time to handle these computations. Furthermore, general-purpose architectures like CPUs have issues in handling such computationally intensive algorithms. Therefore, a lot of interest and efforts have been invested by the research fraternity in specialized hardware architectures such as Graphics Processing Unit (GPU), Field Programmable Gate Array (FPGA), Application Specific Integrated Circuit (ASIC), and Coarse Grained Reconfigurable Array (CGRA) in the context of effective implementation of computationally intensive algorithms. This paper brings forward the various research works carried out on the development and deployment of DNNs using the aforementioned specialized hardware architectures and embedded AI accelerators. The review discusses the detailed description of the specialized hardware-based accelerators used in the training and/or inference of DNN. A comparative study based on factors like power, area, and throughput, is also made on the various accelerators discussed. Finally, future research and development directions are discussed, such as future trends in DNN implementation on specialized hardware accelerators. This review article is intended to serve as a guide for hardware architectures for accelerating and improving the effectiveness of deep learning research.publishedVersio
ONNX-to-Hardware Design Flow for the Generation of Adaptive Neural-Network Accelerators on FPGAs
Neural Networks (NN) provide a solid and reliable way of executing different
types of applications, ranging from speech recognition to medical diagnosis,
speeding up onerous and long workloads. The challenges involved in their
implementation at the edge include providing diversity, flexibility, and
sustainability. That implies, for instance, supporting evolving applications
and algorithms energy-efficiently. Using hardware or software accelerators can
deliver fast and efficient computation of the \acp{nn}, while flexibility can
be exploited to support long-term adaptivity. Nonetheless, handcrafting an NN
for a specific device, despite the possibility of leading to an optimal
solution, takes time and experience, and that's why frameworks for hardware
accelerators are being developed. This work-in-progress study focuses on
exploring the possibility of combining the toolchain proposed by Ratto et al.,
which has the distinctive ability to favor adaptivity, with approximate
computing. The goal will be to allow lightweight adaptable NN inference on
FPGAs at the edge. Before that, the work presents a detailed review of
established frameworks that adopt a similar streaming architecture for future
comparison.Comment: Accepted for presentation at the CPS workshop 2023
(http://www.cpsschool.eu/cps-workshop
- …