200 research outputs found

    An Application-Specific VLIW Processor with Vector Instruction Set for CNN Acceleration

    Full text link
    In recent years, neural networks have surpassed classical algorithms in areas such as object recognition, e.g. in the well-known ImageNet challenge. As a result, great effort is being put into developing fast and efficient accelerators, especially for Convolutional Neural Networks (CNNs). In this work we present ConvAix, a fully C-programmable processor, which -- contrary to many existing architectures -- does not rely on a hard-wired array of multiply-and-accumulate (MAC) units. Instead it maps computations onto independent vector lanes making use of a carefully designed vector instruction set. The presented processor is targeted towards latency-sensitive applications and is capable of executing up to 192 MAC operations per cycle. ConvAix operates at a target clock frequency of 400 MHz in 28nm CMOS, thereby offering state-of-the-art performance with proper flexibility within its target domain. Simulation results for several 2D convolutional layers from well known CNNs (AlexNet, VGG-16) show an average ALU utilization of 72.5% using vector instructions with 16 bit fixed-point arithmetic. Compared to other well-known designs which are less flexible, ConvAix offers competitive energy efficiency of up to 497 GOP/s/W while even surpassing them in terms of area efficiency and processing speed.Comment: Accepted for publication in the proceedings of the 2019 IEEE International Symposium on Circuits and Systems (ISCAS

    GPU-based implementation of real-time system for spiking neural networks

    Get PDF
    Real-time simulations of biological neural networks (BNNs) provide a natural platform for applications in a variety of fields: data classification and pattern recognition, prediction and estimation, signal processing, control and robotics, prosthetics, neurological and neuroscientific modeling. BNNs possess inherently parallel architecture and operate in continuous signal domain. Spiking neural networks (SNNs) are type of BNNs with reduced signal dynamic range: communication between neurons occurs by means of time-stamped events (spikes). SNNs allow reduction of algorithmic complexity and communication data size at a price of little loss in accuracy. Simulation of SNNs using traditional sequential computer architectures results in significant time penalty. This penalty prohibits application of SNNs in real-time systems. Graphical processing units (GPUs) are cost effective devices specifically designed to exploit parallel shared memory-based floating point operations applied not only to computer graphics, but also to scientific computations. This makes them an attractive solution for SNN simulation compared to that of FPGA, ASIC and cluster message passing computing systems. Successful implementations of GPU-based SNN simulations have been already reported. The contribution of this thesis is the development of a scalable GPU-based realtime system that provides initial framework for design and application of SNNs in various domains. The system delivers an interface that establishes communication with neurons in the network as well as visualizes the outcome produced by the network. Accuracy of the simulation is emphasized due to its importance in the systems that exploit spike time dependent plasticity, classical conditioning and learning. As a result, a small network of 3840 Izhikevich neurons implemented as a hybrid system with Parker-Sochacki numerical integration method achieves real time operation on GTX260 device. An application case study of the system modeling receptor layer of retina is reviewed
    • …
    corecore