153 research outputs found

    How Does Batch Normalization Help Optimization?

    Full text link
    Batch Normalization (BatchNorm) is a widely adopted technique that enables faster and more stable training of deep neural networks (DNNs). Despite its pervasiveness, the exact reasons for BatchNorm's effectiveness are still poorly understood. The popular belief is that this effectiveness stems from controlling the change of the layers' input distributions during training to reduce the so-called "internal covariate shift". In this work, we demonstrate that such distributional stability of layer inputs has little to do with the success of BatchNorm. Instead, we uncover a more fundamental impact of BatchNorm on the training process: it makes the optimization landscape significantly smoother. This smoothness induces a more predictive and stable behavior of the gradients, allowing for faster training.Comment: In NeurIPS'1

    ModelDiff: A Framework for Comparing Learning Algorithms

    Full text link
    We study the problem of (learning) algorithm comparison, where the goal is to find differences between models trained with two different learning algorithms. We begin by formalizing this goal as one of finding distinguishing feature transformations, i.e., input transformations that change the predictions of models trained with one learning algorithm but not the other. We then present ModelDiff, a method that leverages the datamodels framework (Ilyas et al., 2022) to compare learning algorithms based on how they use their training data. We demonstrate ModelDiff through three case studies, comparing models trained with/without data augmentation, with/without pre-training, and with different SGD hyperparameters. Our code is available at https://github.com/MadryLab/modeldiff

    Underwater Acoustic Imaging: One-bit Digitisation

    Get PDF
    In underwater acoustic imaging (UAI), the combination of a two-dimensional (2-D) array and replicate correlation can produce 3-D images, typically of objects at a range of 2 m. A system already developed achieves the high data acquisition rate needed through one-bit sampling (sensing only the sign of the received signal). Noise added before the one-bit sampling avoids the production of 'ghosts' in the image. By simulation and mathematical analysis, the effects of one-bit and added noise are studied for a chirp signal, with a restriction so far to 1-D images (image amplitude versus range). Conditions are given for the avoidance of ghosts and the minimisation of 'image noise' - noise in the image due to one-bit and added noise. A model of image noise is proposed, which is corroborated by the tests carried out to date. A general formula for the root-mean-square image noise is obtained. It has previously been suggested that filtering the singal after sampling would improve the image. However, it is shown that filtering is unnecessary and indeed makes the image worse. It is shown that a strong target can suppress evidence of a weak target because, when the strength of the return signal is raised, essentially the amplitude of the added noise must be raised to avoid 'ghosts'. A general formula, giving the ratio of target strengths such that the weak target has a 50% probability of detection, is obtained

    FFCV: Accelerating Training by Removing Data Bottlenecks

    Full text link
    We present FFCV, a library for easy and fast machine learning model training. FFCV speeds up model training by eliminating (often subtle) data bottlenecks from the training process. In particular, we combine techniques such as an efficient file storage format, caching, data pre-loading, asynchronous data transfer, and just-in-time compilation to (a) make data loading and transfer significantly more efficient, ensuring that GPUs can reach full utilization; and (b) offload as much data processing as possible to the CPU asynchronously, freeing GPU cycles for training. Using FFCV, we train ResNet-18 and ResNet-50 on the ImageNet dataset with competitive tradeoff between accuracy and training time. For example, we are able to train an ImageNet ResNet-50 model to 75\% in only 20 mins on a single machine. We demonstrate FFCV's performance, ease-of-use, extensibility, and ability to adapt to resource constraints through several case studies. Detailed installation instructions, documentation, and Slack support channel are available at https://ffcv.io/
    • …
    corecore