200 research outputs found
An Application-Specific VLIW Processor with Vector Instruction Set for CNN Acceleration
In recent years, neural networks have surpassed classical algorithms in areas
such as object recognition, e.g. in the well-known ImageNet challenge. As a
result, great effort is being put into developing fast and efficient
accelerators, especially for Convolutional Neural Networks (CNNs). In this work
we present ConvAix, a fully C-programmable processor, which -- contrary to many
existing architectures -- does not rely on a hard-wired array of
multiply-and-accumulate (MAC) units. Instead it maps computations onto
independent vector lanes making use of a carefully designed vector instruction
set. The presented processor is targeted towards latency-sensitive applications
and is capable of executing up to 192 MAC operations per cycle. ConvAix
operates at a target clock frequency of 400 MHz in 28nm CMOS, thereby offering
state-of-the-art performance with proper flexibility within its target domain.
Simulation results for several 2D convolutional layers from well known CNNs
(AlexNet, VGG-16) show an average ALU utilization of 72.5% using vector
instructions with 16 bit fixed-point arithmetic. Compared to other well-known
designs which are less flexible, ConvAix offers competitive energy efficiency
of up to 497 GOP/s/W while even surpassing them in terms of area efficiency and
processing speed.Comment: Accepted for publication in the proceedings of the 2019 IEEE
International Symposium on Circuits and Systems (ISCAS
GPU-based implementation of real-time system for spiking neural networks
Real-time simulations of biological neural networks (BNNs) provide a natural platform for applications in a variety of fields: data classification and pattern recognition, prediction and estimation, signal processing, control and robotics, prosthetics, neurological and neuroscientific modeling. BNNs possess inherently parallel architecture and operate in continuous signal domain. Spiking neural networks (SNNs) are type of BNNs with reduced signal dynamic range: communication between neurons occurs by means of time-stamped events (spikes). SNNs allow reduction of algorithmic complexity and communication data size at a price of little loss in accuracy. Simulation of SNNs using traditional sequential computer architectures results in significant time penalty. This penalty prohibits application of SNNs in real-time systems. Graphical processing units (GPUs) are cost effective devices specifically designed to exploit parallel shared memory-based floating point operations applied not only to computer graphics, but also to scientific computations. This makes them an attractive solution for SNN simulation compared to that of FPGA, ASIC and cluster message passing computing systems. Successful implementations of GPU-based SNN simulations have been already reported. The contribution of this thesis is the development of a scalable GPU-based realtime system that provides initial framework for design and application of SNNs in various domains. The system delivers an interface that establishes communication with neurons in the network as well as visualizes the outcome produced by the network. Accuracy of the simulation is emphasized due to its importance in the systems that exploit spike time dependent plasticity, classical conditioning and learning. As a result, a small network of 3840 Izhikevich neurons implemented as a hybrid system with Parker-Sochacki numerical integration method achieves real time operation on GTX260 device. An application case study of the system modeling receptor layer of retina is reviewed
- …