168 research outputs found
Efficient Hardware Architectures for Accelerating Deep Neural Networks: Survey
In the modern-day era of technology, a paradigm shift has been witnessed in the areas involving applications of Artificial Intelligence (AI), Machine Learning (ML), and Deep Learning (DL). Specifically, Deep Neural Networks (DNNs) have emerged as a popular field of interest in most AI applications such as computer vision, image and video processing, robotics, etc. In the context of developed digital technologies and the availability of authentic data and data handling infrastructure, DNNs have been a credible choice for solving more complex real-life problems. The performance and accuracy of a DNN is a way better than human intelligence in certain situations. However, it is noteworthy that the DNN is computationally too cumbersome in terms of the resources and time to handle these computations. Furthermore, general-purpose architectures like CPUs have issues in handling such computationally intensive algorithms. Therefore, a lot of interest and efforts have been invested by the research fraternity in specialized hardware architectures such as Graphics Processing Unit (GPU), Field Programmable Gate Array (FPGA), Application Specific Integrated Circuit (ASIC), and Coarse Grained Reconfigurable Array (CGRA) in the context of effective implementation of computationally intensive algorithms. This paper brings forward the various research works carried out on the development and deployment of DNNs using the aforementioned specialized hardware architectures and embedded AI accelerators. The review discusses the detailed description of the specialized hardware-based accelerators used in the training and/or inference of DNN. A comparative study based on factors like power, area, and throughput, is also made on the various accelerators discussed. Finally, future research and development directions are discussed, such as future trends in DNN implementation on specialized hardware accelerators. This review article is intended to serve as a guide for hardware architectures for accelerating and improving the effectiveness of deep learning research.publishedVersio
Customizable vector acceleration in extreme-edge computing. A risc-v software/hardware architecture study on VGG-16 implementation
Computing in the cloud-edge continuum, as opposed to cloud computing, relies on high performance processing on the extreme edge of the Internet of Things (IoT) hierarchy. Hardware acceleration is a mandatory solution to achieve the performance requirements, yet it can be tightly tied to particular computation kernels, even within the same application. Vector-oriented hardware acceleration has gained renewed interest to support artificial intelligence (AI) applications like convolutional networks or classification algorithms. We present a comprehensive investigation of the performance and power efficiency achievable by configurable vector acceleration subsystems, obtaining evidence of both the high potential of the proposed microarchitecture and the advantage of hardware customization in total transparency to the software program
ONNX-to-Hardware Design Flow for the Generation of Adaptive Neural-Network Accelerators on FPGAs
Neural Networks (NN) provide a solid and reliable way of executing different
types of applications, ranging from speech recognition to medical diagnosis,
speeding up onerous and long workloads. The challenges involved in their
implementation at the edge include providing diversity, flexibility, and
sustainability. That implies, for instance, supporting evolving applications
and algorithms energy-efficiently. Using hardware or software accelerators can
deliver fast and efficient computation of the \acp{nn}, while flexibility can
be exploited to support long-term adaptivity. Nonetheless, handcrafting an NN
for a specific device, despite the possibility of leading to an optimal
solution, takes time and experience, and that's why frameworks for hardware
accelerators are being developed. This work-in-progress study focuses on
exploring the possibility of combining the toolchain proposed by Ratto et al.,
which has the distinctive ability to favor adaptivity, with approximate
computing. The goal will be to allow lightweight adaptable NN inference on
FPGAs at the edge. Before that, the work presents a detailed review of
established frameworks that adopt a similar streaming architecture for future
comparison.Comment: Accepted for presentation at the CPS workshop 2023
(http://www.cpsschool.eu/cps-workshop
- …