Model inversion attacks (MIAs) are aimed at recovering private data from a
target model's training set, which poses a threat to the privacy of deep
learning models. MIAs primarily focus on the white-box scenario where the
attacker has full access to the structure and parameters of the target model.
However, practical applications are black-box, it is not easy for adversaries
to obtain model-related parameters, and various models only output predicted
labels. Existing black-box MIAs primarily focused on designing the optimization
strategy, and the generative model is only migrated from the GAN used in
white-box MIA. Our research is the pioneering study of feasible attack models
in label-only black-box scenarios, to the best of our knowledge.
In this paper, we develop a novel method of MIA using the conditional
diffusion model to recover the precise sample of the target without any extra
optimization, as long as the target model outputs the label. Two primary
techniques are introduced to execute the attack. Firstly, select an auxiliary
dataset that is relevant to the target model task, and the labels predicted by
the target model are used as conditions to guide the training process.
Secondly, target labels and random standard normally distributed noise are
input into the trained conditional diffusion model, generating target samples
with pre-defined guidance strength. We then filter out the most robust and
representative samples. Furthermore, we propose for the first time to use
Learned Perceptual Image Patch Similarity (LPIPS) as one of the evaluation
metrics for MIA, with systematic quantitative and qualitative evaluation in
terms of attack accuracy, realism, and similarity. Experimental results show
that this method can generate similar and accurate data to the target without
optimization and outperforms generators of previous approaches in the
label-only scenario.Comment: 11 pages, 6 figures, 2 table