6,087 research outputs found
Shakeout: A New Approach to Regularized Deep Neural Network Training
Recent years have witnessed the success of deep neural networks in dealing
with a plenty of practical problems. Dropout has played an essential role in
many successful deep neural networks, by inducing regularization in the model
training. In this paper, we present a new regularized training approach:
Shakeout. Instead of randomly discarding units as Dropout does at the training
stage, Shakeout randomly chooses to enhance or reverse each unit's contribution
to the next layer. This minor modification of Dropout has the statistical
trait: the regularizer induced by Shakeout adaptively combines , and
regularization terms. Our classification experiments with representative
deep architectures on image datasets MNIST, CIFAR-10 and ImageNet show that
Shakeout deals with over-fitting effectively and outperforms Dropout. We
empirically demonstrate that Shakeout leads to sparser weights under both
unsupervised and supervised settings. Shakeout also leads to the grouping
effect of the input units in a layer. Considering the weights in reflecting the
importance of connections, Shakeout is superior to Dropout, which is valuable
for the deep model compression. Moreover, we demonstrate that Shakeout can
effectively reduce the instability of the training process of the deep
architecture.Comment: Appears at T-PAMI 201
Sparse Training Theory for Scalable and Efficient Agents
A fundamental task for artificial intelligence is learning. Deep Neural
Networks have proven to cope perfectly with all learning paradigms, i.e.
supervised, unsupervised, and reinforcement learning. Nevertheless, traditional
deep learning approaches make use of cloud computing facilities and do not
scale well to autonomous agents with low computational resources. Even in the
cloud, they suffer from computational and memory limitations, and they cannot
be used to model adequately large physical worlds for agents which assume
networks with billions of neurons. These issues are addressed in the last few
years by the emerging topic of sparse training, which trains sparse networks
from scratch. This paper discusses sparse training state-of-the-art, its
challenges and limitations while introducing a couple of new theoretical
research directions which has the potential of alleviating sparse training
limitations to push deep learning scalability well beyond its current
boundaries. Nevertheless, the theoretical advancements impact in complex
multi-agents settings is discussed from a real-world perspective, using the
smart grid case study
- …