180 research outputs found

    Doctor of Philosophy

    Get PDF
    dissertationDeep Neural Networks (DNNs) are the state-of-art solution in a growing number of tasks including computer vision, speech recognition, and genomics. However, DNNs are computationally expensive as they are carefully trained to extract and abstract features from raw data using multiple layers of neurons with millions of parameters. In this dissertation, we primarily focus on inference, e.g., using a DNN to classify an input image. This is an operation that will be repeatedly performed on billions of devices in the datacenter, in self-driving cars, in drones, etc. We observe that DNNs spend a vast majority of their runtime to runtime performing matrix-by-vector multiplications (MVM). MVMs have two major bottlenecks: fetching the matrix and performing sum-of-product operations. To address these bottlenecks, we use in-situ computing, where the matrix is stored in programmable resistor arrays, called crossbars, and sum-of-product operations are performed using analog computing. In this dissertation, we propose two hardware units, ISAAC and Newton.In ISAAC, we show that in-situ computing designs can outperform DNN digital accelerators, if they leverage pipelining, smart encodings, and can distribute a computation in time and space, within crossbars, and across crossbars. In the ISAAC design, roughly half the chip area/power can be attributed to the analog-to-digital conversion (ADC), i.e., it remains the key design challenge in mixed-signal accelerators for deep networks. In spite of the ADC bottleneck, ISAAC is able to out-perform the computational efficiency of the state-of-the-art design (DaDianNao) by 8x. In Newton, we take advantage of a number of techniques to address ADC inefficiency. These techniques exploit matrix transformations, heterogeneity, and smart mapping of computation to the analog substrate. We show that Newton can increase the efficiency of in-situ computing by an additional 2x. Finally, we show that in-situ computing, unfortunately, cannot be easily adapted to handle training of deep networks, i.e., it is only suitable for inference of already-trained networks. By improving the efficiency of DNN inference with ISAAC and Newton, we move closer to low-cost deep learning that in turn will have societal impact through self-driving cars, assistive systems for the disabled, and precision medicine

    Towards Accurate and High-Speed Spiking Neuromorphic Systems with Data Quantization-Aware Deep Networks

    Full text link
    Deep Neural Networks (DNNs) have gained immense success in cognitive applications and greatly pushed today's artificial intelligence forward. The biggest challenge in executing DNNs is their extremely data-extensive computations. The computing efficiency in speed and energy is constrained when traditional computing platforms are employed in such computational hungry executions. Spiking neuromorphic computing (SNC) has been widely investigated in deep networks implementation own to their high efficiency in computation and communication. However, weights and signals of DNNs are required to be quantized when deploying the DNNs on the SNC, which results in unacceptable accuracy loss. %However, the system accuracy is limited by quantizing data directly in deep networks deployment. Previous works mainly focus on weights discretize while inter-layer signals are mainly neglected. In this work, we propose to represent DNNs with fixed integer inter-layer signals and fixed-point weights while holding good accuracy. We implement the proposed DNNs on the memristor-based SNC system as a deployment example. With 4-bit data representation, our results show that the accuracy loss can be controlled within 0.02% (2.3%) on MNIST (CIFAR-10). Compared with the 8-bit dynamic fixed-point DNNs, our system can achieve more than 9.8x speedup, 89.1% energy saving, and 30% area saving.Comment: 6 pages, 4 figure

    Analog Weights in ReRAM DNN Accelerators

    Full text link
    Artificial neural networks have become ubiquitous in modern life, which has triggered the emergence of a new class of application specific integrated circuits for their acceleration. ReRAM-based accelerators have gained significant traction due to their ability to leverage in-memory computations. In a crossbar structure, they can perform multiply-and-accumulate operations more efficiently than standard CMOS logic. By virtue of being resistive switches, ReRAM switches can only reliably store one of two states. This is a severe limitation on the range of values in a computational kernel. This paper presents a novel scheme in alleviating the single-bit-per-device restriction by exploiting frequency dependence of v-i plane hysteresis, and assigning kernel information not only to the device conductance but also partially distributing it to the frequency of a time-varying input. We show this approach reduces average power consumption for a single crossbar convolution by up to a factor of x16 for an unsigned 8-bit input image, where each convolutional process consumes a worst-case of 1.1mW, and reduces area by a factor of x8, without reducing accuracy to the level of binarized neural networks. This presents a massive saving in computing cost when there are many simultaneous in-situ multiply-and-accumulate processes occurring across different crossbars.Comment: 2019 IEEE International Conference on Artificial Intelligence Circuits and Systems, 5 pages, 4 figure
    • …
    corecore