Split learning enables collaborative deep learning model training while
preserving data privacy and model security by avoiding direct sharing of raw
data and model details (i.e., sever and clients only hold partial sub-networks
and exchange intermediate computations). However, existing research has mainly
focused on examining its reliability for privacy protection, with little
investigation into model security. Specifically, by exploring full models,
attackers can launch adversarial attacks, and split learning can mitigate this
severe threat by only disclosing part of models to untrusted servers.This paper
aims to evaluate the robustness of split learning against adversarial attacks,
particularly in the most challenging setting where untrusted servers only have
access to the intermediate layers of the model.Existing adversarial attacks
mostly focus on the centralized setting instead of the collaborative setting,
thus, to better evaluate the robustness of split learning, we develop a
tailored attack called SPADV, which comprises two stages: 1) shadow model
training that addresses the issue of lacking part of the model and 2) local
adversarial attack that produces adversarial examples to evaluate.The first
stage only requires a few unlabeled non-IID data, and, in the second stage,
SPADV perturbs the intermediate output of natural samples to craft the
adversarial ones. The overall cost of the proposed attack process is relatively
low, yet the empirical attack effectiveness is significantly high,
demonstrating the surprising vulnerability of split learning to adversarial
attacks.Comment: accepted by ECAI 2023, camera-ready versio