5,758 research outputs found
Masked Diffusion Models Are Fast and Privacy-Aware Learners
Diffusion models have emerged as the \emph{de-facto} technique for image
generation, yet they entail significant computational overhead, hindering the
technique's broader application in the research community. We propose a
prior-based denoising training framework, the first to incorporate the
pre-train and fine-tune paradigm into the diffusion model training process,
which substantially improves training efficiency and shows potential in
facilitating various downstream tasks. Our approach centers on masking a high
proportion (e.g., up to 90\%) of the input image and employing masked denoising
score matching to denoise the visible areas, thereby guiding the diffusion
model to learn more salient features from training data as prior knowledge. By
utilizing masked learning in a pre-training stage, we efficiently train the
ViT-based diffusion model on CelebA-HQ in the pixel space,
achieving a 4x acceleration and enhancing the quality of generated images
compared to denoising diffusion probabilistic model (DDPM). Moreover, our
masked pre-training technique can be universally applied to various diffusion
models that directly generate images in the pixel space, aiding in the learning
of pre-trained models with superior generalizability. For instance, a diffusion
model pre-trained on VGGFace2 attains a 46\% quality improvement through
fine-tuning with merely 10\% data from a different distribution. Moreover, our
method shows the potential to serve as a training paradigm for enhancing the
privacy protection capabilities of diffusion models. Our code is available at
\url{https://github.com/jiachenlei/maskdm}
- …