Diffusion-based generative models have achieved remarkable success in image
generation. Their guidance formulation allows an external model to
plug-and-play control the generation process for various tasks without
fine-tuning the diffusion model. However, the direct use of publicly available
off-the-shelf models for guidance fails due to their poor performance on noisy
inputs. For that, the existing practice is to fine-tune the guidance models
with labeled data corrupted with noises. In this paper, we argue that this
practice has limitations in two aspects: (1) performing on inputs with
extremely various noises is too hard for a single model; (2) collecting labeled
datasets hinders scaling up for various tasks. To tackle the limitations, we
propose a novel strategy that leverages multiple experts where each expert is
specialized in a particular noise range and guides the reverse process at its
corresponding timesteps. However, as it is infeasible to manage multiple
networks and utilize labeled data, we present a practical guidance framework
termed Practical Plug-And-Play (PPAP), which leverages parameter-efficient
fine-tuning and data-free knowledge transfer. We exhaustively conduct ImageNet
class conditional generation experiments to show that our method can
successfully guide diffusion with small trainable parameters and no labeled
data. Finally, we show that image classifiers, depth estimators, and semantic
segmentation models can guide publicly available GLIDE through our framework in
a plug-and-play manner