17 research outputs found
Shortcut Removal for Improved OOD-Generalization
Machine learning is a data-driven discipline, and learning success is largely
dependent on the quality of the underlying data sets. However, it is becoming
increasingly clear that even high performance on held-out test data does not
necessarily mean that a model generalizes or learns anything meaningful at all.
One reason for this is the presence of machine learning shortcuts, i.e., hints
in the data that are predictive but accidental and semantically unconnected to
the problem. We present a new approach to detect such shortcuts and a technique
to automatically remove them from datasets. Using an adversarially trained
lens, any small and highly predictive clues in images can be detected and
removed. We show that this approach 1) does not cause degradation of model
performance in the absence of these shortcuts, and 2) reliably identifies and
neutralizes shortcuts from different image datasets. In our experiments, we are
able to recover up to 93,8% of model performance in the presence of different
shortcuts. Finally, we apply our model to a real-world dataset from the medical
domain consisting of chest x-rays and identify and remove several types of
shortcuts that are known to hinder real-world applicability. Thus, we hope that
our proposed approach fosters real-world applicability of machine learning
Physical Adversarial Examples for Multi-Camera Systems
Neural networks build the foundation of several intelligent systems, which,
however, are known to be easily fooled by adversarial examples. Recent advances
made these attacks possible even in air-gapped scenarios, where the autonomous
system observes its surroundings by, e.g., a camera. We extend these ideas in
our research and evaluate the robustness of multi-camera setups against such
physical adversarial examples. This scenario becomes ever more important with
the rise in popularity of autonomous vehicles, which fuse the information of
several cameras for their driving decision. While we find that multi-camera
setups provide some robustness towards past attack methods, we see that this
advantage reduces when optimizing on multiple perspectives at once. We propose
a novel attack method that we call Transcender-MC, where we incorporate online
3D renderings and perspective projections in the training process. Moreover, we
motivate that certain data augmentation techniques can facilitate the
generation of successful adversarial examples even further. Transcender-MC is
11% more effective in successfully attacking multi-camera setups than
state-of-the-art methods. Our findings offer valuable insights regarding the
resilience of object detection in a setup with multiple cameras and motivate
the need of developing adequate defense mechanisms against them
Shortcut Detection with Variational Autoencoders
For real-world applications of machine learning (ML), it is essential that
models make predictions based on well-generalizing features rather than
spurious correlations in the data. The identification of such spurious
correlations, also known as shortcuts, is a challenging problem and has so far
been scarcely addressed. In this work, we present a novel approach to detect
shortcuts in image and audio datasets by leveraging variational autoencoders
(VAEs). The disentanglement of features in the latent space of VAEs allows us
to discover feature-target correlations in datasets and semi-automatically
evaluate them for ML shortcuts. We demonstrate the applicability of our method
on several real-world datasets and identify shortcuts that have not been
discovered before.Comment: Accepted at the ICML 2023 Workshop on Spurious Correlations,
Invariance and Stabilit
Protecting Publicly Available Data With Machine Learning Shortcuts
Machine-learning (ML) shortcuts or spurious correlations are artifacts in
datasets that lead to very good training and test performance but severely
limit the model's generalization capability. Such shortcuts are insidious
because they go unnoticed due to good in-domain test performance. In this
paper, we explore the influence of different shortcuts and show that even
simple shortcuts are difficult to detect by explainable AI methods. We then
exploit this fact and design an approach to defend online databases against
crawlers: providers such as dating platforms, clothing manufacturers, or used
car dealers have to deal with a professionalized crawling industry that grabs
and resells data points on a large scale. We show that a deterrent can be
created by deliberately adding ML shortcuts. Such augmented datasets are then
unusable for ML use cases, which deters crawlers and the unauthorized use of
data from the internet. Using real-world data from three use cases, we show
that the proposed approach renders such collected data unusable, while the
shortcut is at the same time difficult to notice in human perception. Thus, our
proposed approach can serve as a proactive protection against illegitimate data
crawling.Comment: Published at BMVC 202