47 research outputs found
Light Multi-segment Activation for Model Compression
Model compression has become necessary when applying neural networks (NN)
into many real application tasks that can accept slightly-reduced model
accuracy with strict tolerance to model complexity. Recently, Knowledge
Distillation, which distills the knowledge from well-trained and highly complex
teacher model into a compact student model, has been widely used for model
compression. However, under the strict requirement on the resource cost, it is
quite challenging to achieve comparable performance with the teacher model,
essentially due to the drastically-reduced expressiveness ability of the
compact student model. Inspired by the nature of the expressiveness ability in
Neural Networks, we propose to use multi-segment activation, which can
significantly improve the expressiveness ability with very little cost, in the
compact student model. Specifically, we propose a highly efficient
multi-segment activation, called Light Multi-segment Activation (LMA), which
can rapidly produce multiple linear regions with very few parameters by
leveraging the statistical information. With using LMA, the compact student
model is capable of achieving much better performance effectively and
efficiently, than the ReLU-equipped one with same model scale. Furthermore, the
proposed method is compatible with other model compression techniques, such as
quantization, which means they can be used jointly for better compression
performance. Experiments on state-of-the-art NN architectures over the
real-world tasks demonstrate the effectiveness and extensibility of the LMA
Learning to Prune Deep Neural Networks via Layer-wise Optimal Brain Surgeon
How to develop slim and accurate deep neural networks has become crucial for
real- world applications, especially for those employed in embedded systems.
Though previous work along this research line has shown some promising results,
most existing methods either fail to significantly compress a well-trained deep
network or require a heavy retraining process for the pruned deep network to
re-boost its prediction performance. In this paper, we propose a new layer-wise
pruning method for deep neural networks. In our proposed method, parameters of
each individual layer are pruned independently based on second order
derivatives of a layer-wise error function with respect to the corresponding
parameters. We prove that the final prediction performance drop after pruning
is bounded by a linear combination of the reconstructed errors caused at each
layer. Therefore, there is a guarantee that one only needs to perform a light
retraining process on the pruned network to resume its original prediction
performance. We conduct extensive experiments on benchmark datasets to
demonstrate the effectiveness of our pruning method compared with several
state-of-the-art baseline methods
Deep neural networks performance optimization in image recognition
In this paper, we consider the problem of insufficient runtime and memory-space complexities of contemporary deep convolutional neural networks in the problem of image recognition. A survey of recent compression methods and efficient neural networks architectures is provided. The experimental study is focused on the visual emotion recognition problem. We compare the computational speed and memory consumption during the training and the inference stages of such methods as the weights matrix decomposition, binarization and hashing in the visual emotion recognition problem. It is experimentally shown that the most efficient recognition is achieved with the full network binarization and matrices decomposition.The article was prepared within the framework of the Academic Fund Program at the National Research University Higher School of Economics (HSE) in 2017 (grant β17-05-0007) and by the Russian Academic Excellence Project "5-100". A.V. Savchenko is supported by Russian Federation President grant no. ΠΠ-306.2017.9
Cooperative initialization based deep neural network training
Researchers have proposed various activation functions. These activation
functions help the deep network to learn non-linear behavior with a significant
effect on training dynamics and task performance. The performance of these
activations also depends on the initial state of the weight parameters, i.e.,
different initial state leads to a difference in the performance of a network.
In this paper, we have proposed a cooperative initialization for training the
deep network using ReLU activation function to improve the network performance.
Our approach uses multiple activation functions in the initial few epochs for
the update of all sets of weight parameters while training the network. These
activation functions cooperate to overcome their drawbacks in the update of
weight parameters, which in effect learn better "feature representation" and
boost the network performance later. Cooperative initialization based training
also helps in reducing the overfitting problem and does not increase the number
of parameters, inference (test) time in the final model while improving the
performance. Experiments show that our approach outperforms various baselines
and, at the same time, performs well over various tasks such as classification
and detection. The Top-1 classification accuracy of the model trained using our
approach improves by 2.8% for VGG-16 and 2.1% for ResNet-56 on CIFAR-100
dataset.Comment: IEEE Winter Conference on Applications of Computer Vision (WACV),
202