Despite the success of diffusion models, the training and inference of
diffusion models are notoriously expensive due to the long chain of the reverse
process. In parallel, the Lottery Ticket Hypothesis (LTH) claims that there
exists winning tickets (i.e., aproperly pruned sub-network together with
original weight initialization) that can achieve performance competitive to the
original dense neural network when trained in isolation. In this work, we for
the first time apply LTH to diffusion models. We empirically find subnetworks
at sparsity 90%-99% without compromising performance for denoising diffusion
probabilistic models on benchmarks (CIFAR-10, CIFAR-100, MNIST). Moreover,
existing LTH works identify the subnetworks with a unified sparsity along
different layers. We observe that the similarity between two winning tickets of
a model varies from block to block. Specifically, the upstream layers from two
winning tickets for a model tend to be more similar than the downstream layers.
Therefore, we propose to find the winning ticket with varying sparsity along
different layers in the model. Experimental results demonstrate that our method
can find sparser sub-models that require less memory for storage and reduce the
necessary number of FLOPs. Codes are available at
https://github.com/osier0524/Lottery-Ticket-to-DDPM