181 research outputs found
Analysis of Confident-Classifiers for Out-of-distribution Detection
Discriminatively trained neural classifiers can be trusted, only when the
input data comes from the training distribution (in-distribution). Therefore,
detecting out-of-distribution (OOD) samples is very important to avoid
classification errors. In the context of OOD detection for image
classification, one of the recent approaches proposes training a classifier
called "confident-classifier" by minimizing the standard cross-entropy loss on
in-distribution samples and minimizing the KL divergence between the predictive
distribution of OOD samples in the low-density regions of in-distribution and
the uniform distribution (maximizing the entropy of the outputs). Thus, the
samples could be detected as OOD if they have low confidence or high entropy.
In this paper, we analyze this setting both theoretically and experimentally.
We conclude that the resulting confident-classifier still yields arbitrarily
high confidence for OOD samples far away from the in-distribution. We instead
suggest training a classifier by adding an explicit "reject" class for OOD
samples.Comment: SafeML 2019 ICLR workshop pape
Classification-Based Anomaly Detection for General Data
Anomaly detection, finding patterns that substantially deviate from those
seen previously, is one of the fundamental problems of artificial intelligence.
Recently, classification-based methods were shown to achieve superior results
on this task. In this work, we present a unifying view and propose an open-set
method, GOAD, to relax current generalization assumptions. Furthermore, we
extend the applicability of transformation-based methods to non-image data
using random affine transformations. Our method is shown to obtain
state-of-the-art accuracy and is applicable to broad data types. The strong
performance of our method is extensively validated on multiple datasets from
different domains.Comment: ICLR'2
Hyperparameter-Free Out-of-Distribution Detection Using Softmax of Scaled Cosine Similarity
The ability to detect out-of-distribution (OOD) samples is vital to secure
the reliability of deep neural networks in real-world applications. Considering
the nature of OOD samples, detection methods should not have hyperparameters
that need to be tuned depending on incoming OOD samples. However, most of the
recently proposed methods do not meet this requirement, leading to compromised
performance in real-world applications. In this paper, we propose a simple,
hyperparameter-free method based on softmax of scaled cosine similarity. It
resembles the approach employed by modern metric learning methods, but it
differs in details; the differences are essential to achieve high detection
performance. We show through experiments that our method outperforms the
existing methods on the evaluation test recently proposed by Shafaei et al.,
which takes the above issue of hyperparameter dependency into account. We also
show that it achieves at least comparable performance to other methods on the
conventional test, where their hyperparameters are chosen using explicit OOD
samples. Furthermore, it is computationally more efficient than most of the
previous methods, since it needs only a single forward pass.Comment: Extend the supplementary materia
WAIC, but Why? Generative Ensembles for Robust Anomaly Detection
Machine learning models encounter Out-of-Distribution (OoD) errors when the
data seen at test time are generated from a different stochastic generator than
the one used to generate the training data. One proposal to scale OoD detection
to high-dimensional data is to learn a tractable likelihood approximation of
the training distribution, and use it to reject unlikely inputs. However,
likelihood models on natural data are themselves susceptible to OoD errors, and
even assign large likelihoods to samples from other datasets. To mitigate this
problem, we propose Generative Ensembles, which robustify density-based OoD
detection by way of estimating epistemic uncertainty of the likelihood model.
We present a puzzling observation in need of an explanation -- although
likelihood measures cannot account for the typical set of a distribution, and
therefore should not be suitable on their own for OoD detection, WAIC performs
surprisingly well in practice
Calibrated Top-1 Uncertainty estimates for classification by score based models
While the accuracy of modern deep learning models has significantly improved
in recent years, the ability of these models to generate uncertainty estimates
has not progressed to the same degree. Uncertainty methods are designed to
provide an estimate of class probabilities when predicting class assignment.
While there are a number of proposed methods for estimating uncertainty, they
all suffer from a lack of calibration: predicted probabilities can be off from
empirical ones by a few percent or more. By restricting the scope of our
predictions to only the probability of Top-1 error, we can decrease the
calibration error of existing methods to less than one percent. As a result,
the scores of the methods also improve significantly over benchmarks.Comment: 12 pages, 5 figures, 6 tables (major revision, new benchmark allows
us to show model calibration is better
Soft Labeling Affects Out-of-Distribution Detection of Deep Neural Networks
Soft labeling becomes a common output regularization for generalization and
model compression of deep neural networks. However, the effect of soft labeling
on out-of-distribution (OOD) detection, which is an important topic of machine
learning safety, is not explored. In this study, we show that soft labeling can
determine OOD detection performance. Specifically, how to regularize outputs of
incorrect classes by soft labeling can deteriorate or improve OOD detection.
Based on the empirical results, we postulate a future work for OOD-robust DNNs:
a proper output regularization by soft labeling can construct OOD-robust DNNs
without additional training of OOD samples or modifying the models, while
improving classification accuracy.Comment: ICML'20 Workshop on Uncertainty and Robustness in Deep Learnin
Why is the Mahalanobis Distance Effective for Anomaly Detection?
The Mahalanobis distance-based confidence score, a recently proposed anomaly
detection method for pre-trained neural classifiers, achieves state-of-the-art
performance on both out-of-distribution (OoD) and adversarial examples
detection. This work analyzes why this method exhibits such strong performance
in practical settings while imposing an implausible assumption; namely, that
class conditional distributions of pre-trained features have tied covariance.
Although the Mahalanobis distance-based method is claimed to be motivated by
classification prediction confidence, we find that its superior performance
stems from information not useful for classification. This suggests that the
reason the Mahalanobis confidence score works so well is mistaken, and makes
use of different information from ODIN, another popular OoD detection method
based on prediction confidence. This perspective motivates us to combine these
two methods, and the combined detector exhibits improved performance and
robustness. These findings provide insight into the behavior of neural
classifiers in response to anomalous inputs
Using Pre-Training Can Improve Model Robustness and Uncertainty
He et al. (2018) have called into question the utility of pre-training by
showing that training from scratch can often yield similar performance to
pre-training. We show that although pre-training may not improve performance on
traditional classification metrics, it improves model robustness and
uncertainty estimates. Through extensive experiments on adversarial examples,
label corruption, class imbalance, out-of-distribution detection, and
confidence calibration, we demonstrate large gains from pre-training and
complementary effects with task-specific methods. We introduce adversarial
pre-training and show approximately a 10% absolute improvement over the
previous state-of-the-art in adversarial robustness. In some cases, using
pre-training without task-specific methods also surpasses the state-of-the-art,
highlighting the need for pre-training when evaluating future methods on
robustness and uncertainty tasks.Comment: ICML 2019. PyTorch code here:
https://github.com/hendrycks/pre-training Figure 3 update
A Less Biased Evaluation of Out-of-distribution Sample Detectors
In the real world, a learning system could receive an input that is unlike
anything it has seen during training. Unfortunately, out-of-distribution
samples can lead to unpredictable behaviour. We need to know whether any given
input belongs to the population distribution of the training/evaluation data to
prevent unpredictable behaviour in deployed systems. A recent surge of interest
in this problem has led to the development of sophisticated techniques in the
deep learning literature. However, due to the absence of a standard problem
definition or an exhaustive evaluation, it is not evident if we can rely on
these methods. What makes this problem different from a typical supervised
learning setting is that the distribution of outliers used in training may not
be the same as the distribution of outliers encountered in the application.
Classical approaches that learn inliers vs. outliers with only two datasets can
yield optimistic results. We introduce OD-test, a three-dataset evaluation
scheme as a more reliable strategy to assess progress on this problem. We
present an exhaustive evaluation of a broad set of methods from related areas
on image classification tasks. Contrary to the existing results, we show that
for realistic applications of high-dimensional images the previous techniques
have low accuracy and are not reliable in practice.Comment: to appear in BMVC 2019; v2 is more compact, with more result
Benchmarking Neural Network Robustness to Common Corruptions and Perturbations
In this paper we establish rigorous benchmarks for image classifier
robustness. Our first benchmark, ImageNet-C, standardizes and expands the
corruption robustness topic, while showing which classifiers are preferable in
safety-critical applications. Then we propose a new dataset called ImageNet-P
which enables researchers to benchmark a classifier's robustness to common
perturbations. Unlike recent robustness research, this benchmark evaluates
performance on common corruptions and perturbations not worst-case adversarial
perturbations. We find that there are negligible changes in relative corruption
robustness from AlexNet classifiers to ResNet classifiers. Afterward we
discover ways to enhance corruption and perturbation robustness. We even find
that a bypassed adversarial defense provides substantial common perturbation
robustness. Together our benchmarks may aid future work toward networks that
robustly generalize.Comment: ICLR 2019 camera-ready; datasets available at
https://github.com/hendrycks/robustness ; this article supersedes
arXiv:1807.0169
- …