8 research outputs found
An Empirical Evaluation on Robustness and Uncertainty of Regularization Methods
Despite apparent human-level performances of deep neural networks (DNN), they
behave fundamentally differently from humans. They easily change predictions
when small corruptions such as blur and noise are applied on the input (lack of
robustness), and they often produce confident predictions on
out-of-distribution samples (improper uncertainty measure). While a number of
researches have aimed to address those issues, proposed solutions are typically
expensive and complicated (e.g. Bayesian inference and adversarial training).
Meanwhile, many simple and cheap regularization methods have been developed to
enhance the generalization of classifiers. Such regularization methods have
largely been overlooked as baselines for addressing the robustness and
uncertainty issues, as they are not specifically designed for that. In this
paper, we provide extensive empirical evaluations on the robustness and
uncertainty estimates of image classifiers (CIFAR-100 and ImageNet) trained
with state-of-the-art regularization methods. Furthermore, experimental results
show that certain regularization methods can serve as strong baseline methods
for robustness and uncertainty estimation of DNNs.Comment: Accepted at ICML 2019 Workshop on Uncertainty and Robustness in Deep
Learning. 7 pages, 1 figur
Improved Robustness to Open Set Inputs via Tempered Mixup
Supervised classification methods often assume that evaluation data is drawn
from the same distribution as training data and that all classes are present
for training. However, real-world classifiers must handle inputs that are far
from the training distribution including samples from unknown classes. Open set
robustness refers to the ability to properly label samples from previously
unseen categories as novel and avoid high-confidence, incorrect predictions.
Existing approaches have focused on either novel inference methods, unique
training architectures, or supplementing the training data with additional
background samples. Here, we propose a simple regularization technique easily
applied to existing convolutional neural network architectures that improves
open set robustness without a background dataset. Our method achieves
state-of-the-art results on open set classification baselines and easily scales
to large-scale open set classification problems.Comment: Proceedings of the ECCV 2020 Workshop on Adversarial Robustness in
the Real Worl
AugMix: A Simple Data Processing Method to Improve Robustness and Uncertainty
Modern deep neural networks can achieve high accuracy when the training
distribution and test distribution are identically distributed, but this
assumption is frequently violated in practice. When the train and test
distributions are mismatched, accuracy can plummet. Currently there are few
techniques that improve robustness to unforeseen data shifts encountered during
deployment. In this work, we propose a technique to improve the robustness and
uncertainty estimates of image classifiers. We propose AugMix, a data
processing technique that is simple to implement, adds limited computational
overhead, and helps models withstand unforeseen corruptions. AugMix
significantly improves robustness and uncertainty measures on challenging image
classification benchmarks, closing the gap between previous methods and the
best possible performance in some cases by more than half.Comment: Code available at https://github.com/google-research/augmi
Self-Knowledge Distillation: A Simple Way for Better Generalization
The generalization capability of deep neural networks has been substantially
improved by applying a wide spectrum of regularization methods, e.g.,
restricting function space, injecting randomness during training, augmenting
data, etc. In this work, we propose a simple yet effective regularization
method named self-knowledge distillation (Self-KD), which progressively
distills a model's own knowledge to soften hard targets (i.e., one-hot vectors)
during training. Hence, it can be interpreted within a framework of knowledge
distillation as a student becomes a teacher itself. The proposed method is
applicable to any supervised learning tasks with hard targets and can be easily
combined with existing regularization methods to further enhance the
generalization performance. Furthermore, we show that Self-KD achieves not only
better accuracy, but also provides high quality of confidence estimates.
Extensive experimental results on three different tasks, image classification,
object detection, and machine translation, demonstrate that our method
consistently improves the performance of the state-of-the-art baselines, and
especially, it achieves state-of-the-art BLEU score of 30.0 and 36.2 on IWSLT15
English-to-German and German-to-English tasks, respectively.Comment: Under revie
The Dilemma Between Data Transformations and Adversarial Robustness for Time Series Application Systems
Adversarial examples, or nearly indistinguishable inputs created by an
attacker, significantly reduce machine learning accuracy. Theoretical evidence
has shown that the high intrinsic dimensionality of datasets facilitates an
adversary's ability to develop effective adversarial examples in classification
models. Adjacently, the presentation of data to a learning model impacts its
performance. For example, we have seen this through dimensionality reduction
techniques used to aid with the generalization of features in machine learning
applications. Thus, data transformation techniques go hand-in-hand with
state-of-the-art learning models in decision-making applications such as
intelligent medical or military systems. With this work, we explore how data
transformations techniques such as feature selection, dimensionality reduction,
or trend extraction techniques may impact an adversary's ability to create
effective adversarial samples on a recurrent neural network. Specifically, we
analyze it from the perspective of the data manifold and the presentation of
its intrinsic features. Our evaluation empirically shows that feature selection
and trend extraction techniques may increase the RNN's vulnerability. A data
transformation technique reduces the vulnerability to adversarial examples only
if it approximates the dataset's intrinsic dimension, minimizes codimension,
and maintains higher manifold coverage
Frustratingly Easy Uncertainty Estimation for Distribution Shift
Distribution shift is an important concern in deep image classification,
produced either by corruption of the source images, or a complete change, with
the solution involving domain adaptation. While the primary goal is to improve
accuracy under distribution shift, an important secondary goal is uncertainty
estimation: evaluating the probability that the prediction of a model is
correct. While improving accuracy is hard, uncertainty estimation turns out to
be frustratingly easy. Prior works have appended uncertainty estimation into
the model and training paradigm in various ways. Instead, we show that we can
estimate uncertainty by simply exposing the original model to corrupted images,
and performing simple statistical calibration on the image outputs. Our
frustratingly easy methods demonstrate superior performance on a wide range of
distribution shifts as well as on unsupervised domain adaptation tasks,
measured through extensive experimentation.Comment: 17 pages, 4 Tables, 9 Figure
AdamP: Slowing Down the Slowdown for Momentum Optimizers on Scale-invariant Weights
Normalization techniques are a boon for modern deep learning. They let
weights converge more quickly with often better generalization performances. It
has been argued that the normalization-induced scale invariance among the
weights provides an advantageous ground for gradient descent (GD) optimizers:
the effective step sizes are automatically reduced over time, stabilizing the
overall training procedure. It is often overlooked, however, that the
additional introduction of momentum in GD optimizers results in a far more
rapid reduction in effective step sizes for scale-invariant weights, a
phenomenon that has not yet been studied and may have caused unwanted side
effects in the current practice. This is a crucial issue because arguably the
vast majority of modern deep neural networks consist of (1) momentum-based GD
(e.g. SGD or Adam) and (2) scale-invariant parameters. In this paper, we verify
that the widely-adopted combination of the two ingredients lead to the
premature decay of effective step sizes and sub-optimal model performances. We
propose a simple and effective remedy, SGDP and AdamP: get rid of the radial
component, or the norm-increasing direction, at each optimizer step. Because of
the scale invariance, this modification only alters the effective step sizes
without changing the effective update directions, thus enjoying the original
convergence properties of GD optimizers. Given the ubiquity of momentum GD and
scale invariance in machine learning, we have evaluated our methods against the
baselines on 13 benchmarks. They range from vision tasks like classification
(e.g. ImageNet), retrieval (e.g. CUB and SOP), and detection (e.g. COCO) to
language modelling (e.g. WikiText) and audio classification (e.g. DCASE) tasks.
We verify that our solution brings about uniform gains in those benchmarks.
Source code is available at https://github.com/clovaai/AdamP.Comment: Accepted at ICLR 2021. First two authors contributed equall
Evaluating Prediction-Time Batch Normalization for Robustness under Covariate Shift
Covariate shift has been shown to sharply degrade both predictive accuracy
and the calibration of uncertainty estimates for deep learning models. This is
worrying, because covariate shift is prevalent in a wide range of real world
deployment settings. However, in this paper, we note that frequently there
exists the potential to access small unlabeled batches of the shifted data just
before prediction time. This interesting observation enables a simple but
surprisingly effective method which we call prediction-time batch
normalization, which significantly improves model accuracy and calibration
under covariate shift. Using this one line code change, we achieve
state-of-the-art on recent covariate shift benchmarks and an mCE of 60.28\% on
the challenging ImageNet-C dataset; to our knowledge, this is the best result
for any model that does not incorporate additional data augmentation or
modification of the training pipeline. We show that prediction-time batch
normalization provides complementary benefits to existing state-of-the-art
approaches for improving robustness (e.g. deep ensembles) and combining the two
further improves performance. Our findings are supported by detailed
measurements of the effect of this strategy on model behavior across rigorous
ablations on various dataset modalities. However, the method has mixed results
when used alongside pre-training, and does not seem to perform as well under
more natural types of dataset shift, and is therefore worthy of additional
study. We include links to the data in our figures to improve reproducibility,
including a Python notebooks that can be run to easily modify our analysis at
https://colab.research.google.com/drive/11N0wDZnMQQuLrRwRoumDCrhSaIhkqjof