34,909 research outputs found
RadiX-Net: Structured Sparse Matrices for Deep Neural Networks
The sizes of deep neural networks (DNNs) are rapidly outgrowing the capacity
of hardware to store and train them. Research over the past few decades has
explored the prospect of sparsifying DNNs before, during, and after training by
pruning edges from the underlying topology. The resulting neural network is
known as a sparse neural network. More recent work has demonstrated the
remarkable result that certain sparse DNNs can train to the same precision as
dense DNNs at lower runtime and storage cost. An intriguing class of these
sparse DNNs is the X-Nets, which are initialized and trained upon a sparse
topology with neither reference to a parent dense DNN nor subsequent pruning.
We present an algorithm that deterministically generates RadiX-Nets: sparse DNN
topologies that, as a whole, are much more diverse than X-Net topologies, while
preserving X-Nets' desired characteristics. We further present a
functional-analytic conjecture based on the longstanding observation that
sparse neural network topologies can attain the same expressive power as dense
counterpartsComment: 7 pages, 8 figures, accepted at IEEE IPDPS 2019 GrAPL workshop. arXiv
admin note: substantial text overlap with arXiv:1809.0524
DIANet: Dense-and-Implicit Attention Network
Attention networks have successfully boosted the performance in various
vision problems. Previous works lay emphasis on designing a new attention
module and individually plug them into the networks. Our paper proposes a
novel-and-simple framework that shares an attention module throughout different
network layers to encourage the integration of layer-wise information and this
parameter-sharing module is referred as Dense-and-Implicit-Attention (DIA)
unit. Many choices of modules can be used in the DIA unit. Since Long Short
Term Memory (LSTM) has a capacity of capturing long-distance dependency, we
focus on the case when the DIA unit is the modified LSTM (refer as DIA-LSTM).
Experiments on benchmark datasets show that the DIA-LSTM unit is capable of
emphasizing layer-wise feature interrelation and leads to significant
improvement of image classification accuracy. We further empirically show that
the DIA-LSTM has a strong regularization ability on stabilizing the training of
deep networks by the experiments with the removal of skip connections or Batch
Normalization in the whole residual network. The code is released at
https://github.com/gbup-group/DIANet
- …