8 research outputs found

    Instance Enhancement Batch Normalization: an Adaptive Regulator of Batch Noise

    Full text link
    Batch Normalization (BN)(Ioffe and Szegedy 2015) normalizes the features of an input image via statistics of a batch of images and hence BN will bring the noise to the gradient of the training loss. Previous works indicate that the noise is important for the optimization and generalization of deep neural networks, but too much noise will harm the performance of networks. In our paper, we offer a new point of view that self-attention mechanism can help to regulate the noise by enhancing instance-specific information to obtain a better regularization effect. Therefore, we propose an attention-based BN called Instance Enhancement Batch Normalization (IEBN) that recalibrates the information of each channel by a simple linear transformation. IEBN has a good capacity of regulating noise and stabilizing network training to improve generalization even in the presence of two kinds of noise attacks during training. Finally, IEBN outperforms BN with only a light parameter increment in image classification tasks for different network structures and benchmark datasets

    Artificial intelligence-powered mobile edge computing-based anomaly detection in cellular networks

    Get PDF
    Escalating cell outages and congestion-treated as anomalies-cost a substantial revenue loss to the cellular operators and severely affect subscriber quality of experience. Stateof-the-art literature applies feed-forward deep neural network at core network (CN) for the detection of above problems in a single cell; however, the solution is impractical as it will overload the CN that monitors thousands of cells at a time. Inspired from mobile edge computing and breakthroughs of deep convolutional neural networks (CNNs) in computer vision research, we split the network into several 100-cell regions each monitored by an edge server; and propose a framework that pre-processes raw call detail records having user activities to create an image-like volume, fed to a CNN model. The framework outputs a multilabeled vector identifying anomalous cell(s). Our results suggest that our solution can detect anomalies with up to 96% accuracy, and is scalable and expandable for industrial Internet of things environment

    Spherical Perspective on Learning with Batch Norm

    Full text link
    Batch Normalization (BN) is a prominent deep learning technique. In spite of its apparent simplicity, its implications over optimization are yet to be fully understood. In this paper, we study the optimization of neural networks with BN layers from a geometric perspective. We leverage the radial invariance of groups of parameters, such as neurons for multi-layer perceptrons or filters for convolutional neural networks, and translate several popular optimization schemes on the L2L_2 unit hypersphere. This formulation and the associated geometric interpretation sheds new light on the training dynamics and the relation between different optimization schemes. In particular, we use it to derive the effective learning rate of Adam and stochastic gradient descent (SGD) with momentum, and we show that in the presence of BN layers, performing SGD alone is actually equivalent to a variant of Adam constrained to the unit hypersphere. Our analysis also leads us to introduce new variants of Adam. We empirically show, over a variety of datasets and architectures, that they improve accuracy in classification tasks. The complete source code for our experiments is available at: https://github.com/ymontmarin/adamsr

    Reconciling Modern Deep Learning with Traditional Optimization Analyses: The Intrinsic Learning Rate

    Full text link
    Recent works (e.g., (Li and Arora, 2020)) suggest that the use of popular normalization schemes (including Batch Normalization) in today's deep learning can move it far from a traditional optimization viewpoint, e.g., use of exponentially increasing learning rates. The current paper highlights other ways in which behavior of normalized nets departs from traditional viewpoints, and then initiates a formal framework for studying their mathematics via suitable adaptation of the conventional framework namely, modeling SGD-induced training trajectory via a suitable stochastic differential equation (SDE) with a noise term that captures gradient noise. This yields: (a) A new ' intrinsic learning rate' parameter that is the product of the normal learning rate and weight decay factor. Analysis of the SDE shows how the effective speed of learning varies and equilibrates over time under the control of intrinsic LR. (b) A challenge -- via theory and experiments -- to popular belief that good generalization requires large learning rates at the start of training. (c) New experiments, backed by mathematical intuition, suggesting the number of steps to equilibrium (in function space) scales as the inverse of the intrinsic learning rate, as opposed to the exponential time convergence bound implied by SDE analysis. We name it the Fast Equilibrium Conjecture and suggest it holds the key to why Batch Normalization is effective.Comment: 25 pages, 12 figures. Accepted By 34th Conference on Neural Information Processing Systems (NeurIPS 2020
    corecore