23 research outputs found

    Improving Model Capacity of Quantized Networks with Conditional Computation

    No full text
    Network quantization becomes a crucial step when deploying deep models to the edge devices as it is hardware-friendly, offers memory and computational advantages, but it also suffers performance degradation as the result of limited representation capability. We address this issue by introducing conditional computing to low-bit quantized networks. Instead of using a fixed, single kernel for each layer, which usually does not generalize well across all input data, our proposed method tries to use multiple parallel kernels dynamically in conjunction with the winner-takes-all gating mechanism to select the best one to propagate information. Overall, our proposed method improves upon the prior work, without adding much computational overhead, results in better classification performance on the CIFAR-10 and CIFAR-100 datasets

    Exploiting Sparse Activation for Low-Power Design of Synchronous Neuromorphic Systems

    No full text

    An Improved K-Spare Decomposing Algorithm for Mapping Neural Networks onto Crossbar-Based Neuromorphic Computing Systems

    No full text
    Mapping deep neural network (DNN) models onto crossbar-based neuromorphic computing system (NCS) has recently become more popular since it allows us to realize the advantages of DNNs on small computing systems. However, due to the physical limitations of NCS, such as limited programmability, or a fixed and small number of neurons and synapses of memristor crossbars (the most important component of NCS), we have to quantize and decompose a DNN model into many partitions before the mapping. However, each weight parameter in the original network has its own scaling factor, while crossbar cell hardware has only one scaling factor. This will cause a significant error and will reduce the performance of the system. To mitigate this issue, the K-spare neuron approach has been proposed, which uses additional K spare neurons to capture more scaling factors. Unfortunately, this approach typically uses a large number of neurons overhead. To mitigate this issue, this paper proposes an improved version of the K-spare neuron method that uses a decomposition algorithm to minimize the neuron number overhead while maintaining the accuracy of the DNN model. We achieve this goal by using a mean squared quantization error (MSQE) to evaluate which crossbar units are more important and use more scaling factor than others, instead of using the same k-spare neurons for all crossbar cells as previous work does. Our experimental results are demonstrated on the ImageNet dataset (ILSVRC2012) and three typical and popular deep convolution neural networks: VGG16, Resnet152, and MobileNet v2. Our proposed method only uses 0.1%, 3.12%, and 2.4% neurons overhead for VGG16, Resnet152, and MobileNet v2 to keep their accuracy loss at 0.44%, 0.63%, and 1.24%, respectively, while other methods use about 10–20% of neurons overhead for the same accuracy loss

    Training Multi-Bit Quantized and Binarized Networks with a Learnable Symmetric Quantizer

    No full text
    Quantizing weights and activations of deep neural networks is essential for deploying them in resource-constrained devices, or cloud platforms for at-scale services. While binarization is a special case of quantization, this extreme case often leads to several training difficulties, and necessitates specialized models and training methods. As a result, recent quantization methods do not provide binarization, thus losing the most resource-efficient option, and quantized and binarized networks have been distinct research areas. We examine binarization difficulties in a quantization framework and find that all we need to enable the binary training are a symmetric quantizer, good initialization, and careful hyperparameter selection. These techniques also lead to substantial improvements in multi-bit quantization. We demonstrate our unified quantization framework, denoted as UniQ, on the ImageNet dataset with various architectures such as ResNet-18,-34 and MobileNetV2. For multi-bit quantization, UniQ outperforms existing methods to achieve the state-of-the-art accuracy. In binarization, the achieved accuracy is comparable to existing state-of-the-art methods even without modifying the original architectures

    READ:Reliability Enhancement in 3D-Memory Exploiting Asymmetric SER Distribution

    No full text

    Delay Defect Diagnosis Methodology Using Path Delay Measurements

    No full text

    Simplifying Deep Neural Networks for FPGA-Like Neuromorphic Systems

    No full text

    Weight Partitioning for Dynamic Fixed-Point Neuromorphic Computing Systems

    No full text

    A Neural Network Decomposition Algorithm for Mapping on Crossbar-Based Computing Systems

    No full text
    Crossbar-based neuromorphic computing to accelerate neural networks is a popular alternative to conventional von Neumann computing systems. It is also referred as processing-in-memory and in-situ analog computing. The crossbars have a fixed number of synapses per neuron and it is necessary to decompose neurons to map networks onto the crossbars. This paper proposes the k-spare decomposition algorithm that can trade off the predictive performance against the neuron usage during the mapping. The proposed algorithm performs a two-level hierarchical decomposition. In the first global decomposition, it decomposes the neural network such that each crossbar has k spare neurons. These neurons are used to improve the accuracy of the partially mapped network in the subsequent local decomposition. Our experimental results using modern convolutional neural networks show that the proposed method can improve the accuracy substantially within about 10% extra neurons
    corecore