1,392 research outputs found
Optimal Brain Surgeon and general network pruning
The use of information from all second-order derivatives of the error function to perform network pruning (i.e., removing unimportant weights from a trained network) in order to improve generalization, simplify networks, reduce hardware or storage requirements, increase the
speed of further training, and, in some cases, enable rule extraction, is investigated. The method, Optimal Brain Surgeon (OBS), is significantly better than magnitude-based methods and Optimal Brain Damage, which often remove the wrong weights. OBS, permits pruning of
more weights than other methods (for the same error on the training set), and thus yields better generalization on test data. Crucial to OBS is a recursion relation for calculating the inverse Hessian matrix
H^-1 from training data and structural information of the set. OBS permits a 76%, a 62%, and a 90% reduction in weights over backpropagation with weight decay on
three benchmark MONK'S problems. Of OBS, Optimal Brain Damage, and a magnitude-based method, only OBS deletes the correct weights from a trained XOR network in every case.
Finally, whereas Sejnowski and Rosenberg used 18,000
weights in their NETtalk network, we used OBS to prune
a network to just 1,560 weights, yielding better generalization
Learning to Prune Deep Neural Networks via Layer-wise Optimal Brain Surgeon
How to develop slim and accurate deep neural networks has become crucial for
real- world applications, especially for those employed in embedded systems.
Though previous work along this research line has shown some promising results,
most existing methods either fail to significantly compress a well-trained deep
network or require a heavy retraining process for the pruned deep network to
re-boost its prediction performance. In this paper, we propose a new layer-wise
pruning method for deep neural networks. In our proposed method, parameters of
each individual layer are pruned independently based on second order
derivatives of a layer-wise error function with respect to the corresponding
parameters. We prove that the final prediction performance drop after pruning
is bounded by a linear combination of the reconstructed errors caused at each
layer. Therefore, there is a guarantee that one only needs to perform a light
retraining process on the pruned network to resume its original prediction
performance. We conduct extensive experiments on benchmark datasets to
demonstrate the effectiveness of our pruning method compared with several
state-of-the-art baseline methods
Optimal Brain Surgeon: Extensions and performance comparison.
We extend Optimal Brain Surgeon (OBS) - a second-order method for pruning networks - to allow for general error measures, and explore a reduced computational and storage implementation via a dominant eigenspace decomposition. Simulations on nonlinear, noisy pattern classification problems reveal that OBS does lead to improved generalization, and performs favorably in comparison with Optimal Brain Damage (OBD). We find that the required retraining steps in OBD may lead to inferior generalization, a result that can be interpreted as due to injecting noise back into the system. A common technique is to stop training of a large network at the minimum validation error. We found that the test error could be reduced even further by means of OBS (but not OBD) pruning. Our results justify the t → o approximation used in OBS and indicate why retraining in a highly pruned network may lead to inferior performance
Power scalable implementation of artificial neural networks
As the use of Artificial Neural Network (ANN) in mobile embedded devices gets more pervasive, power consumption of ANN hardware is becoming a major limiting factor. Although considerable research efforts are now directed towards low-power implementations of ANN, the issue of dynamic power scalability of the implemented design has been largely overlooked. In this paper, we discuss the motivation and basic principles for implementing power scaling in ANN Hardware. With the help of a simple example, we demonstrate how power scaling can be achieved with dynamic pruning techniques
Automated Pruning for Deep Neural Network Compression
In this work we present a method to improve the pruning step of the current
state-of-the-art methodology to compress neural networks. The novelty of the
proposed pruning technique is in its differentiability, which allows pruning to
be performed during the backpropagation phase of the network training. This
enables an end-to-end learning and strongly reduces the training time. The
technique is based on a family of differentiable pruning functions and a new
regularizer specifically designed to enforce pruning. The experimental results
show that the joint optimization of both the thresholds and the network weights
permits to reach a higher compression rate, reducing the number of weights of
the pruned network by a further 14% to 33% compared to the current
state-of-the-art. Furthermore, we believe that this is the first study where
the generalization capabilities in transfer learning tasks of the features
extracted by a pruned network are analyzed. To achieve this goal, we show that
the representations learned using the proposed pruning methodology maintain the
same effectiveness and generality of those learned by the corresponding
non-compressed network on a set of different recognition tasks.Comment: 8 pages, 5 figures. Published as a conference paper at ICPR 201
Data-free parameter pruning for Deep Neural Networks
Deep Neural nets (NNs) with millions of parameters are at the heart of many
state-of-the-art computer vision systems today. However, recent works have
shown that much smaller models can achieve similar levels of performance. In
this work, we address the problem of pruning parameters in a trained NN model.
Instead of removing individual weights one at a time as done in previous works,
we remove one neuron at a time. We show how similar neurons are redundant, and
propose a systematic way to remove them. Our experiments in pruning the densely
connected layers show that we can remove upto 85\% of the total parameters in
an MNIST-trained network, and about 35\% for AlexNet without significantly
affecting performance. Our method can be applied on top of most networks with a
fully connected layer to give a smaller network.Comment: BMVC 201
- …