315 research outputs found
Fixing the problems of deep neural networks will require better training data and learning algorithms
Bowers and colleagues argue that DNNs are poor models of biological vision
because they often learn to rival human accuracy by relying on strategies that
differ markedly from those of humans. We show that this problem is worsening as
DNNs are becoming larger-scale and increasingly more accurate, and prescribe
methods for building DNNs that can reliably model biological vision.Comment: Published as a commentary in Behavioral and Brain Science
Pre-training also Transfers Non-Robustness
Pre-training has enabled state-of-the-art results on many tasks. In spite of
its recognized contribution to generalization, we observed in this study that
pre-training also transfers adversarial non-robustness from pre-trained model
into fine-tuned model in the downstream tasks. Using image classification as an
example, we first conducted experiments on various datasets and network
backbones to uncover the adversarial non-robustness in fine-tuned model.
Further analysis was conducted on examining the learned knowledge of fine-tuned
model and standard model, and revealed that the reason leading to the
non-robustness is the non-robust features transferred from pre-trained model.
Finally, we analyzed the preference for feature learning of the pre-trained
model, explored the factors influencing robustness, and introduced a simple
robust pre-traning solution
You Only Need a Good Embeddings Extractor to Fix Spurious Correlations
Spurious correlations in training data often lead to robustness issues since
models learn to use them as shortcuts. For example, when predicting whether an
object is a cow, a model might learn to rely on its green background, so it
would do poorly on a cow on a sandy background. A standard dataset for
measuring state-of-the-art on methods mitigating this problem is Waterbirds.
The best method (Group Distributionally Robust Optimization - GroupDRO)
currently achieves 89\% worst group accuracy and standard training from scratch
on raw images only gets 72\%. GroupDRO requires training a model in an
end-to-end manner with subgroup labels. In this paper, we show that we can
achieve up to 90\% accuracy without using any sub-group information in the
training set by simply using embeddings from a large pre-trained vision model
extractor and training a linear classifier on top of it. With experiments on a
wide range of pre-trained models and pre-training datasets, we show that the
capacity of the pre-training model and the size of the pre-training dataset
matters. Our experiments reveal that high capacity vision transformers perform
better compared to high capacity convolutional neural networks, and larger
pre-training dataset leads to better worst-group accuracy on the spurious
correlation dataset.Comment: Accepted at ECCV 2022 workshop on Responsible Computer Vision (RCV
On the Value of Out-of-Distribution Testing: An Example of Goodhart's Law
Out-of-distribution (OOD) testing is increasingly popular for evaluating a
machine learning system's ability to generalize beyond the biases of a training
set. OOD benchmarks are designed to present a different joint distribution of
data and labels between training and test time. VQA-CP has become the standard
OOD benchmark for visual question answering, but we discovered three troubling
practices in its current use. First, most published methods rely on explicit
knowledge of the construction of the OOD splits. They often rely on
``inverting'' the distribution of labels, e.g. answering mostly 'yes' when the
common training answer is 'no'. Second, the OOD test set is used for model
selection. Third, a model's in-domain performance is assessed after retraining
it on in-domain splits (VQA v2) that exhibit a more balanced distribution of
labels. These three practices defeat the objective of evaluating
generalization, and put into question the value of methods specifically
designed for this dataset. We show that embarrassingly-simple methods,
including one that generates answers at random, surpass the state of the art on
some question types. We provide short- and long-term solutions to avoid these
pitfalls and realize the benefits of OOD evaluation
Can Language Models perform Abductive Commonsense Reasoning?
Abductive Reasoning is a task of inferring the most plausible hypothesis
given a set of observations. In literature, the community has approached to
solve this challenge by classifying/generating a likely hypothesis that does
not contradict with a past observation and future observation. Some of the most
well-known benchmarks that tackle this problem are aNLI and aNLG (pronounced as
alpha-NLI and alpha-NLG). In this report, I review over some of the
methodologies that were attempted to solve this challenge, re-implement the
baseline models, and analyze some of the weaknesses that current approaches
have. The code and the re-implemented results are available at this link.Comment: 6 page
One Explanation Does Not Fit XIL
Current machine learning models produce outstanding results in many areas
but, at the same time, suffer from shortcut learning and spurious correlations.
To address such flaws, the explanatory interactive machine learning (XIL)
framework has been proposed to revise a model by employing user feedback on a
model's explanation. This work sheds light on the explanations used within this
framework. In particular, we investigate simultaneous model revision through
multiple explanation methods. To this end, we identified that \textit{one
explanation does not fit XIL} and propose considering multiple ones when
revising models via XIL
- …