Machine learning is a data-driven discipline, and learning success is largely
dependent on the quality of the underlying data sets. However, it is becoming
increasingly clear that even high performance on held-out test data does not
necessarily mean that a model generalizes or learns anything meaningful at all.
One reason for this is the presence of machine learning shortcuts, i.e., hints
in the data that are predictive but accidental and semantically unconnected to
the problem. We present a new approach to detect such shortcuts and a technique
to automatically remove them from datasets. Using an adversarially trained
lens, any small and highly predictive clues in images can be detected and
removed. We show that this approach 1) does not cause degradation of model
performance in the absence of these shortcuts, and 2) reliably identifies and
neutralizes shortcuts from different image datasets. In our experiments, we are
able to recover up to 93,8% of model performance in the presence of different
shortcuts. Finally, we apply our model to a real-world dataset from the medical
domain consisting of chest x-rays and identify and remove several types of
shortcuts that are known to hinder real-world applicability. Thus, we hope that
our proposed approach fosters real-world applicability of machine learning