129 research outputs found
Learning Sparse & Ternary Neural Networks with Entropy-Constrained Trained Ternarization (EC2T)
Deep neural networks (DNN) have shown remarkable success in a variety of
machine learning applications. The capacity of these models (i.e., number of
parameters), endows them with expressive power and allows them to reach the
desired performance. In recent years, there is an increasing interest in
deploying DNNs to resource-constrained devices (i.e., mobile devices) with
limited energy, memory, and computational budget. To address this problem, we
propose Entropy-Constrained Trained Ternarization (EC2T), a general framework
to create sparse and ternary neural networks which are efficient in terms of
storage (e.g., at most two binary-masks and two full-precision values are
required to save a weight matrix) and computation (e.g., MAC operations are
reduced to a few accumulations plus two multiplications). This approach
consists of two steps. First, a super-network is created by scaling the
dimensions of a pre-trained model (i.e., its width and depth). Subsequently,
this super-network is simultaneously pruned (using an entropy constraint) and
quantized (that is, ternary values are assigned layer-wise) in a training
process, resulting in a sparse and ternary network representation. We validate
the proposed approach in CIFAR-10, CIFAR-100, and ImageNet datasets, showing
its effectiveness in image classification tasks.Comment: Proceedings of the CVPR'20 Joint Workshop on Efficient Deep Learning
in Computer Vision. Code is available at
https://github.com/d-becking/efficientCNN
One-Cycle Pruning: Pruning ConvNets Under a Tight Training Budget
Introducing sparsity in a neural network has been an efficient way to reduce
its complexity while keeping its performance almost intact. Most of the time,
sparsity is introduced using a three-stage pipeline: 1) train the model to
convergence, 2) prune the model according to some criterion, 3) fine-tune the
pruned model to recover performance. The last two steps are often performed
iteratively, leading to reasonable results but also to a time-consuming and
complex process. In our work, we propose to get rid of the first step of the
pipeline and to combine the two other steps in a single pruning-training cycle,
allowing the model to jointly learn for the optimal weights while being pruned.
We do this by introducing a novel pruning schedule, named One-Cycle Pruning,
which starts pruning from the beginning of the training, and until its very
end. Adopting such a schedule not only leads to better performing pruned models
but also drastically reduces the training budget required to prune a model.
Experiments are conducted on a variety of architectures (VGG-16 and ResNet-18)
and datasets (CIFAR-10, CIFAR-100 and Caltech-101), and for relatively high
sparsity values (80%, 90%, 95% of weights removed). Our results show that
One-Cycle Pruning consistently outperforms commonly used pruning schedules such
as One-Shot Pruning, Iterative Pruning and Automated Gradual Pruning, on a
fixed training budget.Comment: Accepted at Sparsity in Neural Networks (SNN 2021
Neural Architecture Codesign for Fast Bragg Peak Analysis
We develop an automated pipeline to streamline neural architecture codesign
for fast, real-time Bragg peak analysis in high-energy diffraction microscopy.
Traditional approaches, notably pseudo-Voigt fitting, demand significant
computational resources, prompting interest in deep learning models for more
efficient solutions. Our method employs neural architecture search and AutoML
to enhance these models, including hardware costs, leading to the discovery of
more hardware-efficient neural architectures. Our results match the
performance, while achieving a 13 reduction in bit operations compared
to the previous state-of-the-art. We show further speedup through model
compression techniques such as quantization-aware-training and neural network
pruning. Additionally, our hierarchical search space provides greater
flexibility in optimization, which can easily extend to other tasks and
domains.Comment: To appear in 3rd Annual AAAI Workshop on AI to Accelerate Science and
Engineering (AI2ASE
EFFICIENCY COMPARISON OF NETWORKS IN HANDWRITTEN LATIN CHARACTERS RECOGNITION WITH DIACRITICS
The aim of the article is to analyze and compare the performance and accuracy of architectures with a different number of parameters on the example of a set of handwritten Latin characters from the Polish Handwritten Characters Database (PHCD). It is a database of handwriting scans containing letters of the Latin alphabet as well as diacritics characteristic of the Polish language. Each class in the PHCD dataset contains 6,000 scans for each character. The research was carried out on six proposed architectures and compared with the architecture from the literature. Each of the models was trained for 50 epochs, and then the accuracy of prediction was measured on a separate test set. The experiment thus constructed was repeated 20 times for each model. Accuracy, number of parameters and number of floating-point operations performed by the network were compared. The research was conducted on subsets such as uppercase letters, lowercase letters, lowercase letters with diacritics, and a subset of all available characters. The relationship between the number of parameters and the accuracy of the model was indicated. Among the examined architectures, those that significantly improved the prediction accuracy at the expense of a larger network size were selected, and a network with a similar prediction accuracy as the base one, but with twice as many model parameters was selected
BrainTTA: A 35 fJ/op Compiler Programmable Mixed-Precision Transport-Triggered NN SoC
Recently, accelerators for extremely quantized deep neural network (DNN)
inference with operand widths as low as 1-bit have gained popularity due to
their ability to largely cut down energy cost per inference. In this paper, a
flexible SoC with mixed-precision support is presented. Contrary to the current
trend of fixed-datapath accelerators, this architecture makes use of a flexible
datapath based on a Transport-Triggered Architecture (TTA). The architecture is
fully programmable using C. The accelerator has a peak energy efficiency of
35/67/405 fJ/op (binary, ternary, and 8-bit precision) and a throughput of
614/307/77 GOPS, which is unprecedented for a programmable architecture
BrainTTA: A 35 fJ/op Compiler Programmable Mixed-Precision Transport-Triggered NN SoC
Recently, accelerators for extremely quantized deep neural network (DNN) inference with operand widths as low as 1-bit have gained popularity due to their ability to largely cut down energy cost per inference. In this paper, a flexible SoC with mixed-precision support is presented. Contrary to the current trend of fixed-datapath accelerators, this architecture makes use of a flexible datapath based on a Transport-Triggered Architecture (TTA). The architecture is fully programmable using C. The accelerator has a peak energy efficiency of 35/67/405 fJ/op (binary, ternary, and 8-bit precision) and a throughput of 614/307/77 GOPS, which is unprecedented for a programmable architecture
- …