4 research outputs found
Explicit Tradeoffs between Adversarial and Natural Distributional Robustness
Several existing works study either adversarial or natural distributional
robustness of deep neural networks separately. In practice, however, models
need to enjoy both types of robustness to ensure reliability. In this work, we
bridge this gap and show that in fact, explicit tradeoffs exist between
adversarial and natural distributional robustness. We first consider a simple
linear regression setting on Gaussian data with disjoint sets of core and
spurious features. In this setting, through theoretical and empirical analysis,
we show that (i) adversarial training with and norms
increases the model reliance on spurious features; (ii) For
adversarial training, spurious reliance only occurs when the scale of the
spurious features is larger than that of the core features; (iii) adversarial
training can have an unintended consequence in reducing distributional
robustness, specifically when spurious correlations are changed in the new test
domain. Next, we present extensive empirical evidence, using a test suite of
twenty adversarially trained models evaluated on five benchmark datasets
(ObjectNet, RIVAL10, Salient ImageNet-1M, ImageNet-9, Waterbirds), that
adversarially trained classifiers rely on backgrounds more than their
standardly trained counterparts, validating our theoretical results. We also
show that spurious correlations in training data (when preserved in the test
domain) can improve adversarial robustness, revealing that previous claims that
adversarial vulnerability is rooted in spurious correlations are incomplete.Comment: Accepted to NeurIPS 202
Spuriosity Rankings: Sorting Data to Measure and Mitigate Biases
We present a simple but effective method to measure and mitigate model biases
caused by reliance on spurious cues. Instead of requiring costly changes to
one's data or model training, our method better utilizes the data one already
has by sorting them. Specifically, we rank images within their classes based on
spuriosity (the degree to which common spurious cues are present), proxied via
deep neural features of an interpretable network. With spuriosity rankings, it
is easy to identify minority subpopulations (i.e. low spuriosity images) and
assess model bias as the gap in accuracy between high and low spuriosity
images. One can even efficiently remove a model's bias at little cost to
accuracy by finetuning its classification head on low spuriosity images,
resulting in fairer treatment of samples regardless of spuriosity. We
demonstrate our method on ImageNet, annotating class-feature
dependencies ( of which we find to be spurious) and generating a dataset
of soft segmentations for these features along the way. Having computed
spuriosity rankings via the identified spurious neural features, we assess
biases for diverse models and find that class-wise biases are highly
correlated across models. Our results suggest that model bias due to spurious
feature reliance is influenced far more by what the model is trained on than
how it is trained.Comment: Accepted to NeurIPS '23 (Spotlight). Camera ready versio
A Holistic Approach to Unifying Automatic Concept Extraction and Concept Importance Estimation
In recent years, concept-based approaches have emerged as some of the most
promising explainability methods to help us interpret the decisions of
Artificial Neural Networks (ANNs). These methods seek to discover intelligible
visual 'concepts' buried within the complex patterns of ANN activations in two
key steps: (1) concept extraction followed by (2) importance estimation. While
these two steps are shared across methods, they all differ in their specific
implementations. Here, we introduce a unifying theoretical framework that
comprehensively defines and clarifies these two steps. This framework offers
several advantages as it allows us: (i) to propose new evaluation metrics for
comparing different concept extraction approaches; (ii) to leverage modern
attribution methods and evaluation metrics to extend and systematically
evaluate state-of-the-art concept-based approaches and importance estimation
techniques; (iii) to derive theoretical guarantees regarding the optimality of
such methods. We further leverage our framework to try to tackle a crucial
question in explainability: how to efficiently identify clusters of data points
that are classified based on a similar shared strategy. To illustrate these
findings and to highlight the main strategies of a model, we introduce a visual
representation called the strategic cluster graph. Finally, we present
https://serre-lab.github.io/Lens, a dedicated website that offers a complete
compilation of these visualizations for all classes of the ImageNet dataset
Text-To-Concept (and Back) via Cross-Model Alignment
We observe that the mapping between an image's representation in one model to
its representation in another can be learned surprisingly well with just a
linear layer, even across diverse models. Building on this observation, we
propose , where features from a fixed pretrained
model are aligned linearly to the CLIP space, so that text embeddings from
CLIP's text encoder become directly comparable to the aligned features. With
text-to-concept, we convert fixed off-the-shelf vision encoders to surprisingly
strong zero-shot classifiers for free, with accuracy at times even surpassing
that of CLIP, despite being much smaller models and trained on a small fraction
of the data compared to CLIP. We show other immediate use-cases of
text-to-concept, like building concept bottleneck models with no concept
supervision, diagnosing distribution shifts in terms of human concepts, and
retrieving images satisfying a set of text-based constraints. Lastly, we
demonstrate the feasibility of , where vectors in a
model's feature space are decoded by first aligning to the CLIP before being
fed to a GPT-based generative model. Our work suggests existing deep models,
with presumably diverse architectures and training, represent input samples
relatively similarly, and a two-way communication across model representation
spaces and to humans (through language) is viable.Comment: Accepted to ICML 2023 and CVPR4XAI workshop 202