153 research outputs found
How Does Batch Normalization Help Optimization?
Batch Normalization (BatchNorm) is a widely adopted technique that enables
faster and more stable training of deep neural networks (DNNs). Despite its
pervasiveness, the exact reasons for BatchNorm's effectiveness are still poorly
understood. The popular belief is that this effectiveness stems from
controlling the change of the layers' input distributions during training to
reduce the so-called "internal covariate shift". In this work, we demonstrate
that such distributional stability of layer inputs has little to do with the
success of BatchNorm. Instead, we uncover a more fundamental impact of
BatchNorm on the training process: it makes the optimization landscape
significantly smoother. This smoothness induces a more predictive and stable
behavior of the gradients, allowing for faster training.Comment: In NeurIPS'1
ModelDiff: A Framework for Comparing Learning Algorithms
We study the problem of (learning) algorithm comparison, where the goal is to
find differences between models trained with two different learning algorithms.
We begin by formalizing this goal as one of finding distinguishing feature
transformations, i.e., input transformations that change the predictions of
models trained with one learning algorithm but not the other. We then present
ModelDiff, a method that leverages the datamodels framework (Ilyas et al.,
2022) to compare learning algorithms based on how they use their training data.
We demonstrate ModelDiff through three case studies, comparing models trained
with/without data augmentation, with/without pre-training, and with different
SGD hyperparameters. Our code is available at
https://github.com/MadryLab/modeldiff
Underwater Acoustic Imaging: One-bit Digitisation
In underwater acoustic imaging (UAI), the combination of a two-dimensional (2-D) array and replicate correlation can produce 3-D images, typically of objects at a range of 2 m. A system already developed achieves the high data acquisition rate needed through one-bit sampling (sensing only the sign of the received signal). Noise added before the one-bit sampling avoids the production of 'ghosts' in the image. By simulation and mathematical analysis, the effects of one-bit and added noise are studied for a chirp signal, with a restriction so far to 1-D images (image amplitude versus range). Conditions are given for the avoidance of ghosts and the minimisation of 'image noise' - noise in the image due to one-bit and added noise. A model of image noise is proposed, which is corroborated by the tests carried out to date. A general formula for the root-mean-square image noise is obtained. It has previously been suggested that filtering the singal after sampling would improve the image. However, it is shown that filtering is unnecessary and indeed makes the image worse. It is shown that a strong target can suppress evidence of a weak target because, when the strength of the return signal is raised, essentially the amplitude of the added noise must be raised to avoid 'ghosts'. A general formula, giving the ratio of target strengths such that the weak target has a 50% probability of detection, is obtained
FFCV: Accelerating Training by Removing Data Bottlenecks
We present FFCV, a library for easy and fast machine learning model training.
FFCV speeds up model training by eliminating (often subtle) data bottlenecks
from the training process. In particular, we combine techniques such as an
efficient file storage format, caching, data pre-loading, asynchronous data
transfer, and just-in-time compilation to (a) make data loading and transfer
significantly more efficient, ensuring that GPUs can reach full utilization;
and (b) offload as much data processing as possible to the CPU asynchronously,
freeing GPU cycles for training. Using FFCV, we train ResNet-18 and ResNet-50
on the ImageNet dataset with competitive tradeoff between accuracy and training
time. For example, we are able to train an ImageNet ResNet-50 model to 75\% in
only 20 mins on a single machine. We demonstrate FFCV's performance,
ease-of-use, extensibility, and ability to adapt to resource constraints
through several case studies. Detailed installation instructions,
documentation, and Slack support channel are available at https://ffcv.io/
- …