As a deep learning model typically contains millions of trainable weights,
there has been a growing demand for a more efficient network structure with
reduced storage space and improved run-time efficiency. Pruning is one of the
most popular network compression techniques. In this paper, we propose a novel
unstructured pruning pipeline, Attention-based Simultaneous sparse structure
and Weight Learning (ASWL). Unlike traditional channel-wise or weight-wise
attention mechanism, ASWL proposed an efficient algorithm to calculate the
pruning ratio through layer-wise attention for each layer, and both weights for
the dense network and the sparse network are tracked so that the pruned
structure is simultaneously learned from randomly initialized weights. Our
experiments on MNIST, Cifar10, and ImageNet show that ASWL achieves superior
pruning results in terms of accuracy, pruning ratio and operating efficiency
when compared with state-of-the-art network pruning methods