129 research outputs found

    Learning Sparse & Ternary Neural Networks with Entropy-Constrained Trained Ternarization (EC2T)

    Full text link
    Deep neural networks (DNN) have shown remarkable success in a variety of machine learning applications. The capacity of these models (i.e., number of parameters), endows them with expressive power and allows them to reach the desired performance. In recent years, there is an increasing interest in deploying DNNs to resource-constrained devices (i.e., mobile devices) with limited energy, memory, and computational budget. To address this problem, we propose Entropy-Constrained Trained Ternarization (EC2T), a general framework to create sparse and ternary neural networks which are efficient in terms of storage (e.g., at most two binary-masks and two full-precision values are required to save a weight matrix) and computation (e.g., MAC operations are reduced to a few accumulations plus two multiplications). This approach consists of two steps. First, a super-network is created by scaling the dimensions of a pre-trained model (i.e., its width and depth). Subsequently, this super-network is simultaneously pruned (using an entropy constraint) and quantized (that is, ternary values are assigned layer-wise) in a training process, resulting in a sparse and ternary network representation. We validate the proposed approach in CIFAR-10, CIFAR-100, and ImageNet datasets, showing its effectiveness in image classification tasks.Comment: Proceedings of the CVPR'20 Joint Workshop on Efficient Deep Learning in Computer Vision. Code is available at https://github.com/d-becking/efficientCNN

    One-Cycle Pruning: Pruning ConvNets Under a Tight Training Budget

    Full text link
    Introducing sparsity in a neural network has been an efficient way to reduce its complexity while keeping its performance almost intact. Most of the time, sparsity is introduced using a three-stage pipeline: 1) train the model to convergence, 2) prune the model according to some criterion, 3) fine-tune the pruned model to recover performance. The last two steps are often performed iteratively, leading to reasonable results but also to a time-consuming and complex process. In our work, we propose to get rid of the first step of the pipeline and to combine the two other steps in a single pruning-training cycle, allowing the model to jointly learn for the optimal weights while being pruned. We do this by introducing a novel pruning schedule, named One-Cycle Pruning, which starts pruning from the beginning of the training, and until its very end. Adopting such a schedule not only leads to better performing pruned models but also drastically reduces the training budget required to prune a model. Experiments are conducted on a variety of architectures (VGG-16 and ResNet-18) and datasets (CIFAR-10, CIFAR-100 and Caltech-101), and for relatively high sparsity values (80%, 90%, 95% of weights removed). Our results show that One-Cycle Pruning consistently outperforms commonly used pruning schedules such as One-Shot Pruning, Iterative Pruning and Automated Gradual Pruning, on a fixed training budget.Comment: Accepted at Sparsity in Neural Networks (SNN 2021

    Neural Architecture Codesign for Fast Bragg Peak Analysis

    Full text link
    We develop an automated pipeline to streamline neural architecture codesign for fast, real-time Bragg peak analysis in high-energy diffraction microscopy. Traditional approaches, notably pseudo-Voigt fitting, demand significant computational resources, prompting interest in deep learning models for more efficient solutions. Our method employs neural architecture search and AutoML to enhance these models, including hardware costs, leading to the discovery of more hardware-efficient neural architectures. Our results match the performance, while achieving a 13×\times reduction in bit operations compared to the previous state-of-the-art. We show further speedup through model compression techniques such as quantization-aware-training and neural network pruning. Additionally, our hierarchical search space provides greater flexibility in optimization, which can easily extend to other tasks and domains.Comment: To appear in 3rd Annual AAAI Workshop on AI to Accelerate Science and Engineering (AI2ASE

    EFFICIENCY COMPARISON OF NETWORKS IN HANDWRITTEN LATIN CHARACTERS RECOGNITION WITH DIACRITICS

    Get PDF
    The aim of the article is to analyze and compare the performance and accuracy of architectures with a different number of parameters on the example of a set of handwritten Latin characters from the Polish Handwritten Characters Database (PHCD). It is a database of handwriting scans containing letters of the Latin alphabet as well as diacritics characteristic of the Polish language. Each class in the PHCD dataset contains 6,000 scans for each character. The research was carried out on six proposed architectures and compared with the architecture from the literature. Each of the models was trained for 50 epochs, and then the accuracy of prediction was measured on a separate test set. The experiment thus constructed was repeated 20 times for each model. Accuracy, number of parameters and number of floating-point operations performed by the network were compared. The research was conducted on subsets such as uppercase letters, lowercase letters, lowercase letters with diacritics, and a subset of all available characters. The relationship between the number of parameters and the accuracy of the model was indicated. Among the examined architectures, those that significantly improved the prediction accuracy at the expense of a larger network size were selected, and a network with a similar prediction accuracy as the base one, but with twice as many model parameters was selected

    BrainTTA: A 35 fJ/op Compiler Programmable Mixed-Precision Transport-Triggered NN SoC

    Full text link
    Recently, accelerators for extremely quantized deep neural network (DNN) inference with operand widths as low as 1-bit have gained popularity due to their ability to largely cut down energy cost per inference. In this paper, a flexible SoC with mixed-precision support is presented. Contrary to the current trend of fixed-datapath accelerators, this architecture makes use of a flexible datapath based on a Transport-Triggered Architecture (TTA). The architecture is fully programmable using C. The accelerator has a peak energy efficiency of 35/67/405 fJ/op (binary, ternary, and 8-bit precision) and a throughput of 614/307/77 GOPS, which is unprecedented for a programmable architecture

    BrainTTA: A 35 fJ/op Compiler Programmable Mixed-Precision Transport-Triggered NN SoC

    Get PDF
    Recently, accelerators for extremely quantized deep neural network (DNN) inference with operand widths as low as 1-bit have gained popularity due to their ability to largely cut down energy cost per inference. In this paper, a flexible SoC with mixed-precision support is presented. Contrary to the current trend of fixed-datapath accelerators, this architecture makes use of a flexible datapath based on a Transport-Triggered Architecture (TTA). The architecture is fully programmable using C. The accelerator has a peak energy efficiency of 35/67/405 fJ/op (binary, ternary, and 8-bit precision) and a throughput of 614/307/77 GOPS, which is unprecedented for a programmable architecture
    corecore