2 research outputs found
SoK: Pitfalls in Evaluating Black-Box Attacks
Numerous works study black-box attacks on image classifiers. However, these
works make different assumptions on the adversary's knowledge and current
literature lacks a cohesive organization centered around the threat model. To
systematize knowledge in this area, we propose a taxonomy over the threat space
spanning the axes of feedback granularity, the access of interactive queries,
and the quality and quantity of the auxiliary data available to the attacker.
Our new taxonomy provides three key insights. 1) Despite extensive literature,
numerous under-explored threat spaces exist, which cannot be trivially solved
by adapting techniques from well-explored settings. We demonstrate this by
establishing a new state-of-the-art in the less-studied setting of access to
top-k confidence scores by adapting techniques from well-explored settings of
accessing the complete confidence vector, but show how it still falls short of
the more restrictive setting that only obtains the prediction label,
highlighting the need for more research. 2) Identification the threat model of
different attacks uncovers stronger baselines that challenge prior
state-of-the-art claims. We demonstrate this by enhancing an initially weaker
baseline (under interactive query access) via surrogate models, effectively
overturning claims in the respective paper. 3) Our taxonomy reveals
interactions between attacker knowledge that connect well to related areas,
such as model inversion and extraction attacks. We discuss how advances in
other areas can enable potentially stronger black-box attacks. Finally, we
emphasize the need for a more realistic assessment of attack success by
factoring in local attack runtime. This approach reveals the potential for
certain attacks to achieve notably higher success rates and the need to
evaluate attacks in diverse and harder settings, highlighting the need for
better selection criteria