813 research outputs found
Differentiable Sparsification for Deep Neural Networks
Deep neural networks have relieved a great deal of burden on human experts in
relation to feature engineering. However, comparable efforts are instead
required to determine effective architectures. In addition, as the sizes of
networks have grown overly large, a considerable amount of resources is also
invested in reducing the sizes. The sparsification of an over-complete model
addresses these problems as it removes redundant components and connections. In
this study, we propose a fully differentiable sparsification method for deep
neural networks which allows parameters to be zero during training via
stochastic gradient descent. Thus, the proposed method can learn the sparsified
structure and weights of a network in an end-to-end manner. The method is
directly applicable to various modern deep neural networks and imposes minimum
modification to existing models. To the best of our knowledge, this is the
first fully [sub-]differentiable sparsification method that zeroes out
parameters. It provides a foundation for future structure learning and model
compression methods
- …