17,321 research outputs found
Model compression as constrained optimization, with application to neural nets. Part I: general framework
Compressing neural nets is an active research problem, given the large size
of state-of-the-art nets for tasks such as object recognition, and the
computational limits imposed by mobile devices. We give a general formulation
of model compression as constrained optimization. This includes many types of
compression: quantization, low-rank decomposition, pruning, lossless
compression and others. Then, we give a general algorithm to optimize this
nonconvex problem based on the augmented Lagrangian and alternating
optimization. This results in a "learning-compression" algorithm, which
alternates a learning step of the uncompressed model, independent of the
compression type, with a compression step of the model parameters, independent
of the learning task. This simple, efficient algorithm is guaranteed to find
the best compressed model for the task in a local sense under standard
assumptions.
We present separately in several companion papers the development of this
general framework into specific algorithms for model compression based on
quantization, pruning and other variations, including experimental results on
compressing neural nets and other models.Comment: 23 pages, 2 figure
Model compression as constrained optimization, with application to neural nets. Part II: quantization
We consider the problem of deep neural net compression by quantization: given
a large, reference net, we want to quantize its real-valued weights using a
codebook with entries so that the training loss of the quantized net is
minimal. The codebook can be optimally learned jointly with the net, or fixed,
as for binarization or ternarization approaches. Previous work has quantized
the weights of the reference net, or incorporated rounding operations in the
backpropagation algorithm, but this has no guarantee of converging to a
loss-optimal, quantized net. We describe a new approach based on the recently
proposed framework of model compression as constrained optimization
\citep{Carreir17a}. This results in a simple iterative "learning-compression"
algorithm, which alternates a step that learns a net of continuous weights with
a step that quantizes (or binarizes/ternarizes) the weights, and is guaranteed
to converge to local optimum of the loss for quantized nets. We develop
algorithms for an adaptive codebook or a (partially) fixed codebook. The latter
includes binarization, ternarization, powers-of-two and other important
particular cases. We show experimentally that we can achieve much higher
compression rates than previous quantization work (even using just 1 bit per
weight) with negligible loss degradation.Comment: 33 pages, 15 figures, 3 table
Multi-task Neural Networks for QSAR Predictions
Although artificial neural networks have occasionally been used for
Quantitative Structure-Activity/Property Relationship (QSAR/QSPR) studies in
the past, the literature has of late been dominated by other machine learning
techniques such as random forests. However, a variety of new neural net
techniques along with successful applications in other domains have renewed
interest in network approaches. In this work, inspired by the winning team's
use of neural networks in a recent QSAR competition, we used an artificial
neural network to learn a function that predicts activities of compounds for
multiple assays at the same time. We conducted experiments leveraging recent
methods for dealing with overfitting in neural networks as well as other tricks
from the neural networks literature. We compared our methods to alternative
methods reported to perform well on these tasks and found that our neural net
methods provided superior performance
A Convex Duality Framework for GANs
Generative adversarial network (GAN) is a minimax game between a generator
mimicking the true model and a discriminator distinguishing the samples
produced by the generator from the real training samples. Given an
unconstrained discriminator able to approximate any function, this game reduces
to finding the generative model minimizing a divergence measure, e.g. the
Jensen-Shannon (JS) divergence, to the data distribution. However, in practice
the discriminator is constrained to be in a smaller class such as
neural nets. Then, a natural question is how the divergence minimization
interpretation changes as we constrain . In this work, we address
this question by developing a convex duality framework for analyzing GANs. For
a convex set , this duality framework interprets the original GAN
formulation as finding the generative model with minimum JS-divergence to the
distributions penalized to match the moments of the data distribution, with the
moments specified by the discriminators in . We show that this
interpretation more generally holds for f-GAN and Wasserstein GAN. As a
byproduct, we apply the duality framework to a hybrid of f-divergence and
Wasserstein distance. Unlike the f-divergence, we prove that the proposed
hybrid divergence changes continuously with the generative model, which
suggests regularizing the discriminator's Lipschitz constant in f-GAN and
vanilla GAN. We numerically evaluate the power of the suggested regularization
schemes for improving GAN's training performance
Deep Structured Prediction with Nonlinear Output Transformations
Deep structured models are widely used for tasks like semantic segmentation,
where explicit correlations between variables provide important prior
information which generally helps to reduce the data needs of deep nets.
However, current deep structured models are restricted by oftentimes very local
neighborhood structure, which cannot be increased for computational complexity
reasons, and by the fact that the output configuration, or a representation
thereof, cannot be transformed further. Very recent approaches which address
those issues include graphical model inference inside deep nets so as to permit
subsequent non-linear output space transformations. However, optimization of
those formulations is challenging and not well understood. Here, we develop a
novel model which generalizes existing approaches, such as structured
prediction energy networks, and discuss a formulation which maintains
applicability of existing inference techniques.Comment: Appearing in NIPS 201
A flexible, extensible software framework for model compression based on the LC algorithm
We propose a software framework based on the ideas of the
Learning-Compression (LC) algorithm, that allows a user to compress a neural
network or other machine learning model using different compression schemes
with minimal effort. Currently, the supported compressions include pruning,
quantization, low-rank methods (including automatically learning the layer
ranks), and combinations of those, and the user can choose different
compression types for different parts of a neural network.
The LC algorithm alternates two types of steps until convergence: a learning
(L) step, which trains a model on a dataset (using an algorithm such as SGD);
and a compression (C) step, which compresses the model parameters (using a
compression scheme such as low-rank or quantization). This decoupling of the
"machine learning" aspect from the "signal compression" aspect means that
changing the model or the compression type amounts to calling the corresponding
subroutine in the L or C step, respectively. The library fully supports this by
design, which makes it flexible and extensible. This does not come at the
expense of performance: the runtime needed to compress a model is comparable to
that of training the model in the first place; and the compressed model is
competitive in terms of prediction accuracy and compression ratio with other
algorithms (which are often specialized for specific models or compression
schemes). The library is written in Python and PyTorch and available in Github.Comment: 15 pages, 4 figures, 2 table
Trading-off Accuracy and Energy of Deep Inference on Embedded Systems: A Co-Design Approach
Deep neural networks have seen tremendous success for different modalities of
data including images, videos, and speech. This success has led to their
deployment in mobile and embedded systems for real-time applications. However,
making repeated inferences using deep networks on embedded systems poses
significant challenges due to constrained resources (e.g., energy and computing
power). To address these challenges, we develop a principled co-design
approach. Building on prior work, we develop a formalism referred to as
Coarse-to-Fine Networks (C2F Nets) that allow us to employ classifiers of
varying complexity to make predictions. We propose a principled optimization
algorithm to automatically configure C2F Nets for a specified trade-off between
accuracy and energy consumption for inference. The key idea is to select a
classifier on-the-fly whose complexity is proportional to the hardness of the
input example: simple classifiers for easy inputs and complex classifiers for
hard inputs. We perform comprehensive experimental evaluation using four
different C2F Net architectures on multiple real-world image classification
tasks. Our results show that optimized C2F Net can reduce the Energy Delay
Product (EDP) by 27 to 60 percent with no loss in accuracy when compared to the
baseline solution, where all predictions are made using the most complex
classifier in C2F Net.Comment: Published in IEEE Trans. on CAD of Integrated Circuits and System
An Essay on Optimization Mystery of Deep Learning
Despite the huge empirical success of deep learning, theoretical
understanding of neural networks learning process is still lacking. This is the
reason, why some of its features seem "mysterious". We emphasize two mysteries
of deep learning: generalization mystery, and optimization mystery. In this
essay we review and draw connections between several selected works concerning
the latter
Imposing Hard Constraints on Deep Networks: Promises and Limitations
Imposing constraints on the output of a Deep Neural Net is one way to improve
the quality of its predictions while loosening the requirements for labeled
training data. Such constraints are usually imposed as soft constraints by
adding new terms to the loss function that is minimized during training. An
alternative is to impose them as hard constraints, which has a number of
theoretical benefits but has not been explored so far due to the perceived
intractability of the problem.
In this paper, we show that imposing hard constraints can in fact be done in
a computationally feasible way and delivers reasonable results. However, the
theoretical benefits do not materialize and the resulting technique is no
better than existing ones relying on soft constraints. We analyze the reasons
for this and hope to spur other researchers into proposing better solutions
Applying Spiking Neural Nets to Noise Shaping
-In recent years, there has been an increased focus on the mechanics of
information transmission in spiking neural networks. Especially the Noise
Shaping properties of these networks and their similarity to Delta-Sigma
Modulators has received a lot of attention. However, very little of the
research done in this area has focused on the effect the weights in these
networks have on the Noise Shaping properties and on post- processing of the
network output signal. This paper concerns itself with the various modes of
network operation and beneficial as well as detrimental effects which the
systematic generation of network weights can effect. Also, a method for
post-processing of the spiking output signal is introduced, bringing the output
signal more in line with conventional Delta-Sigma Modulators. Relevancy of this
research to industrial application of neural nets as building blocks of
oversampled A/D converters is shown. Also, further points of contention are
listed, which must be thoroughly researched to add to the above mentioned
applicability of spiking neural nets
- …