142 research outputs found
Customizable vector acceleration in extreme-edge computing. A risc-v software/hardware architecture study on VGG-16 implementation
Computing in the cloud-edge continuum, as opposed to cloud computing, relies on high performance processing on the extreme edge of the Internet of Things (IoT) hierarchy. Hardware acceleration is a mandatory solution to achieve the performance requirements, yet it can be tightly tied to particular computation kernels, even within the same application. Vector-oriented hardware acceleration has gained renewed interest to support artificial intelligence (AI) applications like convolutional networks or classification algorithms. We present a comprehensive investigation of the performance and power efficiency achievable by configurable vector acceleration subsystems, obtaining evidence of both the high potential of the proposed microarchitecture and the advantage of hardware customization in total transparency to the software program
Hardware-Efficient Structure of the Accelerating Module for Implementation of Convolutional Neural Network Basic Operation
This paper presents a structural design of the hardware-efficient module for
implementation of convolution neural network (CNN) basic operation with reduced
implementation complexity. For this purpose we utilize some modification of the
Winograd minimal filtering method as well as computation vectorization
principles. This module calculate inner products of two consecutive segments of
the original data sequence, formed by a sliding window of length 3, with the
elements of a filter impulse response. The fully parallel structure of the
module for calculating these two inner products, based on the implementation of
a naive method of calculation, requires 6 binary multipliers and 4 binary
adders. The use of the Winograd minimal filtering method allows to construct a
module structure that requires only 4 binary multipliers and 8 binary adders.
Since a high-performance convolutional neural network can contain tens or even
hundreds of such modules, such a reduction can have a significant effect.Comment: 3 pages, 5 figure
- …