190,346 research outputs found
Constraining Implicit Space with Minimum Description Length: An Unsupervised Attention Mechanism across Neural Network Layers
Inspired by the adaptation phenomenon of neuronal firing, we propose the
regularity normalization (RN) as an unsupervised attention mechanism (UAM)
which computes the statistical regularity in the implicit space of neural
networks under the Minimum Description Length (MDL) principle. Treating the
neural network optimization process as a partially observable model selection
problem, UAM constrains the implicit space by a normalization factor, the
universal code length. We compute this universal code incrementally across
neural network layers and demonstrated the flexibility to include data priors
such as top-down attention and other oracle information. Empirically, our
approach outperforms existing normalization methods in tackling limited,
imbalanced and non-stationary input distribution in image classification,
classic control, procedurally-generated reinforcement learning, generative
modeling, handwriting generation and question answering tasks with various
neural network architectures. Lastly, UAM tracks dependency and critical
learning stages across layers and recurrent time steps of deep networks
Optimization Theory for ReLU Neural Networks Trained with Normalization Layers
The success of deep neural networks is in part due to the use of
normalization layers. Normalization layers like Batch Normalization, Layer
Normalization and Weight Normalization are ubiquitous in practice, as they
improve generalization performance and speed up training significantly.
Nonetheless, the vast majority of current deep learning theory and non-convex
optimization literature focuses on the un-normalized setting, where the
functions under consideration do not exhibit the properties of commonly
normalized neural networks. In this paper, we bridge this gap by giving the
first global convergence result for two-layer neural networks with ReLU
activations trained with a normalization layer, namely Weight Normalization.
Our analysis shows how the introduction of normalization layers changes the
optimization landscape and can enable faster convergence as compared with
un-normalized neural networks.Comment: To be presented at ICML 202
- …