Masked image modeling (MIM) revolutionizes self-supervised learning (SSL) for
image pre-training. In contrast to previous dominating self-supervised methods,
i.e., contrastive learning, MIM attains state-of-the-art performance by masking
and reconstructing random patches of the input image. However, the associated
security and privacy risks of this novel generative method are unexplored. In
this paper, we perform the first security risk quantification of MIM through
the lens of backdoor attacks. Different from previous work, we are the first to
systematically threat modeling on SSL in every phase of the model supply chain,
i.e., pre-training, release, and downstream phases. Our evaluation shows that
models built with MIM are vulnerable to existing backdoor attacks in release
and downstream phases and are compromised by our proposed method in
pre-training phase. For instance, on CIFAR10, the attack success rate can reach
99.62%, 96.48%, and 98.89% in the downstream phase, release phase, and
pre-training phase, respectively. We also take the first step to investigate
the success factors of backdoor attacks in the pre-training phase and find the
trigger number and trigger pattern play key roles in the success of backdoor
attacks while trigger location has only tiny effects. In the end, our empirical
study of the defense mechanisms across three detection-level on model supply
chain phases indicates that different defenses are suitable for backdoor
attacks in different phases. However, backdoor attacks in the release phase
cannot be detected by all three detection-level methods, calling for more
effective defenses in future research