40 research outputs found
Artificial neural networks condensation: A strategy to facilitate adaption of machine learning in medical settings by reducing computational burden
Machine Learning (ML) applications on healthcare can have a great impact on
people's lives helping deliver better and timely treatment to those in need. At
the same time, medical data is usually big and sparse requiring important
computational resources. Although it might not be a problem for wide-adoption
of ML tools in developed nations, availability of computational resource can
very well be limited in third-world nations. This can prevent the less favored
people from benefiting of the advancement in ML applications for healthcare. In
this project we explored methods to increase computational efficiency of ML
algorithms, in particular Artificial Neural Nets (NN), while not compromising
the accuracy of the predicted results. We used in-hospital mortality prediction
as our case analysis based on the MIMIC III publicly available dataset. We
explored three methods on two different NN architectures. We reduced the size
of recurrent neural net (RNN) and dense neural net (DNN) by applying pruning of
"unused" neurons. Additionally, we modified the RNN structure by adding a
hidden-layer to the LSTM cell allowing to use less recurrent layers for the
model. Finally, we implemented quantization on DNN forcing the weights to be
8-bits instead of 32-bits. We found that all our methods increased
computational efficiency without compromising accuracy and some of them even
achieved higher accuracy than the pre-condensed baseline models
CPOT: Channel Pruning via Optimal Transport
Recent advances in deep neural networks (DNNs) lead to tremendously growing
network parameters, making the deployments of DNNs on platforms with limited
resources extremely difficult. Therefore, various pruning methods have been
developed to compress the deep network architectures and accelerate the
inference process. Most of the existing channel pruning methods discard the
less important filters according to well-designed filter ranking criteria.
However, due to the limited interpretability of deep learning models, designing
an appropriate ranking criterion to distinguish redundant filters is difficult.
To address such a challenging issue, we propose a new technique of Channel
Pruning via Optimal Transport, dubbed CPOT. Specifically, we locate the
Wasserstein barycenter for channels of each layer in the deep models, which is
the mean of a set of probability distributions under the optimal transport
metric. Then, we prune the redundant information located by Wasserstein
barycenters. At last, we empirically demonstrate that, for classification
tasks, CPOT outperforms the state-of-the-art methods on pruning ResNet-20,
ResNet-32, ResNet-56, and ResNet-110. Furthermore, we show that the proposed
CPOT technique is good at compressing the StarGAN models by pruning in the more
difficult case of image-to-image translation tasks.Comment: 11 page
Filter Pruning via Geometric Median for Deep Convolutional Neural Networks Acceleration
Previous works utilized ''smaller-norm-less-important'' criterion to prune
filters with smaller norm values in a convolutional neural network. In this
paper, we analyze this norm-based criterion and point out that its
effectiveness depends on two requirements that are not always met: (1) the norm
deviation of the filters should be large; (2) the minimum norm of the filters
should be small. To solve this problem, we propose a novel filter pruning
method, namely Filter Pruning via Geometric Median (FPGM), to compress the
model regardless of those two requirements. Unlike previous methods, FPGM
compresses CNN models by pruning filters with redundancy, rather than those
with ''relatively less'' importance. When applied to two image classification
benchmarks, our method validates its usefulness and strengths. Notably, on
CIFAR-10, FPGM reduces more than 52% FLOPs on ResNet-110 with even 2.69%
relative accuracy improvement. Moreover, on ILSVRC-2012, FPGM reduces more than
42% FLOPs on ResNet-101 without top-5 accuracy drop, which has advanced the
state-of-the-art. Code is publicly available on GitHub:
https://github.com/he-y/filter-pruning-geometric-medianComment: Accepted to CVPR 2019 (Oral
PruneNet: Channel Pruning via Global Importance
Channel pruning is one of the predominant approaches for accelerating deep
neural networks. Most existing pruning methods either train from scratch with a
sparsity inducing term such as group lasso, or prune redundant channels in a
pretrained network and then fine tune the network. Both strategies suffer from
some limitations: the use of group lasso is computationally expensive,
difficult to converge and often suffers from worse behavior due to the
regularization bias. The methods that start with a pretrained network either
prune channels uniformly across the layers or prune channels based on the basic
statistics of the network parameters. These approaches either ignore the fact
that some CNN layers are more redundant than others or fail to adequately
identify the level of redundancy in different layers. In this work, we
investigate a simple-yet-effective method for pruning channels based on a
computationally light-weight yet effective data driven optimization step that
discovers the necessary width per layer. Experiments conducted on ILSVRC-
confirm effectiveness of our approach. With non-uniform pruning across the
layers on ResNet-, we are able to match the FLOP reduction of
state-of-the-art channel pruning results while achieving a higher
accuracy. Further, we show that our pruned ResNet- network outperforms
ResNet- and ResNet- networks, and that our pruned ResNet-
outperforms ResNet-.Comment: 12 pages, 3 figures, Published in ICLR 2020 NAS Worksho
A flexible, extensible software framework for model compression based on the LC algorithm
We propose a software framework based on the ideas of the
Learning-Compression (LC) algorithm, that allows a user to compress a neural
network or other machine learning model using different compression schemes
with minimal effort. Currently, the supported compressions include pruning,
quantization, low-rank methods (including automatically learning the layer
ranks), and combinations of those, and the user can choose different
compression types for different parts of a neural network.
The LC algorithm alternates two types of steps until convergence: a learning
(L) step, which trains a model on a dataset (using an algorithm such as SGD);
and a compression (C) step, which compresses the model parameters (using a
compression scheme such as low-rank or quantization). This decoupling of the
"machine learning" aspect from the "signal compression" aspect means that
changing the model or the compression type amounts to calling the corresponding
subroutine in the L or C step, respectively. The library fully supports this by
design, which makes it flexible and extensible. This does not come at the
expense of performance: the runtime needed to compress a model is comparable to
that of training the model in the first place; and the compressed model is
competitive in terms of prediction accuracy and compression ratio with other
algorithms (which are often specialized for specific models or compression
schemes). The library is written in Python and PyTorch and available in Github.Comment: 15 pages, 4 figures, 2 table
Meta Filter Pruning to Accelerate Deep Convolutional Neural Networks
Existing methods usually utilize pre-defined criterions, such as p-norm, to
prune unimportant filters. There are two major limitations in these methods.
First, the relations of the filters are largely ignored. The filters usually
work jointly to make an accurate prediction in a collaborative way. Similar
filters will have equivalent effects on the network prediction, and the
redundant filters can be further pruned. Second, the pruning criterion remains
unchanged during training. As the network updated at each iteration, the filter
distribution also changes continuously. The pruning criterions should also be
adaptively switched. In this paper, we propose Meta Filter Pruning (MFP) to
solve the above problems. First, as a complement to the existing p-norm
criterion, we introduce a new pruning criterion considering the filter relation
via filter distance. Additionally, we build a meta pruning framework for filter
pruning, so that our method could adaptively select the most appropriate
pruning criterion as the filter distribution changes. Experiments validate our
approach on two image classification benchmarks. Notably, on ILSVRC-2012, our
MFP reduces more than 50% FLOPs on ResNet-50 with only 0.44% top-5 accuracy
loss.Comment: 10 page
SpArSe: Sparse Architecture Search for CNNs on Resource-Constrained Microcontrollers
The vast majority of processors in the world are actually microcontroller
units (MCUs), which find widespread use performing simple control tasks in
applications ranging from automobiles to medical devices and office equipment.
The Internet of Things (IoT) promises to inject machine learning into many of
these every-day objects via tiny, cheap MCUs. However, these
resource-impoverished hardware platforms severely limit the complexity of
machine learning models that can be deployed. For example, although
convolutional neural networks (CNNs) achieve state-of-the-art results on many
visual recognition tasks, CNN inference on MCUs is challenging due to severe
finite memory limitations. To circumvent the memory challenge associated with
CNNs, various alternatives have been proposed that do fit within the memory
budget of an MCU, albeit at the cost of prediction accuracy. This paper
challenges the idea that CNNs are not suitable for deployment on MCUs. We
demonstrate that it is possible to automatically design CNNs which generalize
well, while also being small enough to fit onto memory-limited MCUs. Our Sparse
Architecture Search method combines neural architecture search with pruning in
a single, unified approach, which learns superior models on four popular IoT
datasets. The CNNs we find are more accurate and up to smaller
than previous approaches, while meeting the strict MCU working memory
constraint
ExpandNets: Linear Over-parameterization to Train Compact Convolutional Networks
In this paper, we introduce an approach to training a given compact network.
To this end, we leverage over-parameterization, which typically improves both
optimization and generalization in neural network training, while being
unnecessary at inference time. We propose to expand each linear layer, both
fully-connected and convolutional, of the compact network into multiple linear
layers, without adding any nonlinearity. As such, the resulting expanded
network can benefit from over-parameterization during training but can be
compressed back to the compact one algebraically at inference. We introduce
several expansion strategies, together with an initialization scheme, and
demonstrate the benefits of our ExpandNets on several tasks, including image
classification, object detection, and semantic segmentation. As evidenced by
our experiments, our approach outperforms both training the compact network
from scratch and performing knowledge distillation from a teacher
Utilizing Explainable AI for Quantization and Pruning of Deep Neural Networks
For many applications, utilizing DNNs (Deep Neural Networks) requires their
implementation on a target architecture in an optimized manner concerning
energy consumption, memory requirement, throughput, etc. DNN compression is
used to reduce the memory footprint and complexity of a DNN before its
deployment on hardware. Recent efforts to understand and explain AI (Artificial
Intelligence) methods have led to a new research area, termed as explainable
AI. Explainable AI methods allow us to understand better the inner working of
DNNs, such as the importance of different neurons and features. The concepts
from explainable AI provide an opportunity to improve DNN compression methods
such as quantization and pruning in several ways that have not been
sufficiently explored so far. In this paper, we utilize explainable AI methods:
mainly DeepLIFT method. We use these methods for (1) pruning of DNNs; this
includes structured and unstructured pruning of \ac{CNN} filters pruning as
well as pruning weights of fully connected layers, (2) non-uniform quantization
of DNN weights using clustering algorithm; this is also referred to as Weight
Sharing, and (3) integer-based mixed-precision quantization; this is where each
layer of a DNN may use a different number of integer bits. We use typical image
classification datasets with common deep learning image classification models
for evaluation. In all these three cases, we demonstrate significant
improvements as well as new insights and opportunities from the use of
explainable AI in DNN compression
One-Shot Pruning of Recurrent Neural Networks by Jacobian Spectrum Evaluation
Recent advances in the sparse neural network literature have made it possible
to prune many large feed forward and convolutional networks with only a small
quantity of data. Yet, these same techniques often falter when applied to the
problem of recovering sparse recurrent networks. These failures are
quantitative: when pruned with recent techniques, RNNs typically obtain worse
performance than they do under a simple random pruning scheme. The failures are
also qualitative: the distribution of active weights in a pruned LSTM or GRU
network tend to be concentrated in specific neurons and gates, and not well
dispersed across the entire architecture. We seek to rectify both the
quantitative and qualitative issues with recurrent network pruning by
introducing a new recurrent pruning objective derived from the spectrum of the
recurrent Jacobian. Our objective is data efficient (requiring only 64 data
points to prune the network), easy to implement, and produces 95% sparse GRUs
that significantly improve on existing baselines. We evaluate on sequential
MNIST, Billion Words, and Wikitext