5 research outputs found
Black-Box Dissector: Towards Erasing-based Hard-Label Model Stealing Attack
Previous studies have verified that the functionality of black-box models can
be stolen with full probability outputs. However, under the more practical
hard-label setting, we observe that existing methods suffer from catastrophic
performance degradation. We argue this is due to the lack of rich information
in the probability prediction and the overfitting caused by hard labels. To
this end, we propose a novel hard-label model stealing method termed
\emph{black-box dissector}, which consists of two erasing-based modules. One is
a CAM-driven erasing strategy that is designed to increase the information
capacity hidden in hard labels from the victim model. The other is a
random-erasing-based self-knowledge distillation module that utilizes soft
labels from the substitute model to mitigate overfitting. Extensive experiments
on four widely-used datasets consistently demonstrate that our method
outperforms state-of-the-art methods, with an improvement of at most .
We also validate the effectiveness and practical potential of our method on
real-world APIs and defense methods. Furthermore, our method promotes other
downstream tasks, \emph{i.e.}, transfer adversarial attacks