Adversarial attacks are an important security concern for computer vision
(CV), as they enable malicious attackers to reliably manipulate CV models.
Existing attacks aim to elicit an output desired by the attacker, but keep the
model fully intact on clean data. With CV models becoming increasingly valuable
assets in applied practice, a new attack vector is emerging: disrupting the
models as a form of economic sabotage. This paper opens up the exploration of
damaging adversarial attacks (DAAs) that seek to damage the target model and
maximize the total cost incurred by the damage. As a pioneer DAA, this paper
proposes Trainwreck, a train-time attack that poisons the training data of
image classifiers to degrade their performance. Trainwreck conflates the data
of similar classes using stealthy (ϵ≤8/255) class-pair universal
perturbations computed using a surrogate model. Trainwreck is a black-box,
transferable attack: it requires no knowledge of the target model's
architecture, and a single poisoned dataset degrades the performance of any
model trained on it. The experimental evaluation on CIFAR-10 and CIFAR-100
demonstrates that Trainwreck is indeed an effective attack across various model
architectures including EfficientNetV2, ResNeXt-101, and a finetuned ViT-L-16.
The strength of the attack can be customized by the poison rate parameter.
Finally, data redundancy with file hashing and/or pixel difference are
identified as a reliable defense technique against Trainwreck or similar DAAs.
The code is available at https://github.com/JanZahalka/trainwreck