8 research outputs found
Instance Enhancement Batch Normalization: an Adaptive Regulator of Batch Noise
Batch Normalization (BN)(Ioffe and Szegedy 2015) normalizes the features of
an input image via statistics of a batch of images and hence BN will bring the
noise to the gradient of the training loss. Previous works indicate that the
noise is important for the optimization and generalization of deep neural
networks, but too much noise will harm the performance of networks. In our
paper, we offer a new point of view that self-attention mechanism can help to
regulate the noise by enhancing instance-specific information to obtain a
better regularization effect. Therefore, we propose an attention-based BN
called Instance Enhancement Batch Normalization (IEBN) that recalibrates the
information of each channel by a simple linear transformation. IEBN has a good
capacity of regulating noise and stabilizing network training to improve
generalization even in the presence of two kinds of noise attacks during
training. Finally, IEBN outperforms BN with only a light parameter increment in
image classification tasks for different network structures and benchmark
datasets
Artificial intelligence-powered mobile edge computing-based anomaly detection in cellular networks
Escalating cell outages and congestion-treated as anomalies-cost a substantial revenue loss to the cellular operators and severely affect subscriber quality of experience. Stateof-the-art literature applies feed-forward deep neural network at core network (CN) for the detection of above problems in a single cell; however, the solution is impractical as it will overload the CN that monitors thousands of cells at a time. Inspired from mobile edge computing and breakthroughs of deep convolutional neural networks (CNNs) in computer vision research, we split the network into several 100-cell regions each monitored by an edge server; and propose a framework that pre-processes raw call detail records having user activities to create an image-like volume, fed to a CNN model. The framework outputs a multilabeled vector identifying anomalous cell(s). Our results suggest that our solution can detect anomalies with up to 96% accuracy, and is scalable and expandable for industrial Internet of things environment
Spherical Perspective on Learning with Batch Norm
Batch Normalization (BN) is a prominent deep learning technique. In spite of
its apparent simplicity, its implications over optimization are yet to be fully
understood. In this paper, we study the optimization of neural networks with BN
layers from a geometric perspective. We leverage the radial invariance of
groups of parameters, such as neurons for multi-layer perceptrons or filters
for convolutional neural networks, and translate several popular optimization
schemes on the unit hypersphere. This formulation and the associated
geometric interpretation sheds new light on the training dynamics and the
relation between different optimization schemes. In particular, we use it to
derive the effective learning rate of Adam and stochastic gradient descent
(SGD) with momentum, and we show that in the presence of BN layers, performing
SGD alone is actually equivalent to a variant of Adam constrained to the unit
hypersphere. Our analysis also leads us to introduce new variants of Adam. We
empirically show, over a variety of datasets and architectures, that they
improve accuracy in classification tasks. The complete source code for our
experiments is available at: https://github.com/ymontmarin/adamsr
Reconciling Modern Deep Learning with Traditional Optimization Analyses: The Intrinsic Learning Rate
Recent works (e.g., (Li and Arora, 2020)) suggest that the use of popular
normalization schemes (including Batch Normalization) in today's deep learning
can move it far from a traditional optimization viewpoint, e.g., use of
exponentially increasing learning rates. The current paper highlights other
ways in which behavior of normalized nets departs from traditional viewpoints,
and then initiates a formal framework for studying their mathematics via
suitable adaptation of the conventional framework namely, modeling SGD-induced
training trajectory via a suitable stochastic differential equation (SDE) with
a noise term that captures gradient noise. This yields: (a) A new ' intrinsic
learning rate' parameter that is the product of the normal learning rate and
weight decay factor. Analysis of the SDE shows how the effective speed of
learning varies and equilibrates over time under the control of intrinsic LR.
(b) A challenge -- via theory and experiments -- to popular belief that good
generalization requires large learning rates at the start of training. (c) New
experiments, backed by mathematical intuition, suggesting the number of steps
to equilibrium (in function space) scales as the inverse of the intrinsic
learning rate, as opposed to the exponential time convergence bound implied by
SDE analysis. We name it the Fast Equilibrium Conjecture and suggest it holds
the key to why Batch Normalization is effective.Comment: 25 pages, 12 figures. Accepted By 34th Conference on Neural
Information Processing Systems (NeurIPS 2020