Class-conditional generative models are crucial tools for data generation
from user-specified class labels. Existing approaches for class-conditional
generative models require nontrivial modifications of backbone generative
architectures to model conditional information fed into the model. This paper
introduces a plug-and-play module named `multimodal controller' to generate
multimodal data without introducing additional learning parameters. In the
absence of the controllers, our model reduces to non-conditional generative
models. We test the efficacy of multimodal controllers on CIFAR10, COIL100, and
Omniglot benchmark datasets. We demonstrate that multimodal controlled
generative models (including VAE, PixelCNN, Glow, and GAN) can generate
class-conditional images of significantly better quality when compared with
conditional generative models. Moreover, we show that multimodal controlled
models can also create novel modalities of images