56 research outputs found
Hardware-Efficient Structure of the Accelerating Module for Implementation of Convolutional Neural Network Basic Operation
This paper presents a structural design of the hardware-efficient module for
implementation of convolution neural network (CNN) basic operation with reduced
implementation complexity. For this purpose we utilize some modification of the
Winograd minimal filtering method as well as computation vectorization
principles. This module calculate inner products of two consecutive segments of
the original data sequence, formed by a sliding window of length 3, with the
elements of a filter impulse response. The fully parallel structure of the
module for calculating these two inner products, based on the implementation of
a naive method of calculation, requires 6 binary multipliers and 4 binary
adders. The use of the Winograd minimal filtering method allows to construct a
module structure that requires only 4 binary multipliers and 8 binary adders.
Since a high-performance convolutional neural network can contain tens or even
hundreds of such modules, such a reduction can have a significant effect.Comment: 3 pages, 5 figure
PCNNA: A Photonic Convolutional Neural Network Accelerator
Convolutional Neural Networks (CNN) have been the centerpiece of many
applications including but not limited to computer vision, speech processing,
and Natural Language Processing (NLP). However, the computationally expensive
convolution operations impose many challenges to the performance and
scalability of CNNs. In parallel, photonic systems, which are traditionally
employed for data communication, have enjoyed recent popularity for data
processing due to their high bandwidth, low power consumption, and
reconfigurability. Here we propose a Photonic Convolutional Neural Network
Accelerator (PCNNA) as a proof of concept design to speedup the convolution
operation for CNNs. Our design is based on the recently introduced silicon
photonic microring weight banks, which use broadcast-and-weight protocol to
perform Multiply And Accumulate (MAC) operation and move data through layers of
a neural network. Here, we aim to exploit the synergy between the inherent
parallelism of photonics in the form of Wavelength Division Multiplexing (WDM)
and sparsity of connections between input feature maps and kernels in CNNs.
While our full system design offers up to more than 3 orders of magnitude
speedup in execution time, its optical core potentially offers more than 5
order of magnitude speedup compared to state-of-the-art electronic
counterparts.Comment: 5 Pages, 6 Figures, IEEE SOCC 201
Learning in AI Processor
AI processor, which can run artificial intelligence algorithms, is a state-of-the-art accelerator,in essence, to perform special algorithm in various applications. In particular,these are four AI applications: VR/AR smartphone games, high-performance computing, Advanced Driver Assistance Systems and IoT. Deep learning using convolutional neural networks (CNNs) involves embedding intelligence into applications to perform tasks and has achieved unprecedented accuracy [1]. Usually, the powerful multi-core processors and the on-chip tensor processing accelerator unit are prominent hardware features of deep learning AI processor. After data is collected by sensors, tools such as image processing technique, voice recognition and autonomous drone navigation, are adopted to pre-process and analyze data. In recent years, plenty of technologies associating with deep learning Al processor including cognitive spectrum sensing, computer vision and semantic reasoning become a focus in current research
- …