5 research outputs found
MEGAN: Mixture of Experts of Generative Adversarial Networks for Multimodal Image Generation
Recently, generative adversarial networks (GANs) have shown promising
performance in generating realistic images. However, they often struggle in
learning complex underlying modalities in a given dataset, resulting in
poor-quality generated images. To mitigate this problem, we present a novel
approach called mixture of experts GAN (MEGAN), an ensemble approach of
multiple generator networks. Each generator network in MEGAN specializes in
generating images with a particular subset of modalities, e.g., an image class.
Instead of incorporating a separate step of handcrafted clustering of multiple
modalities, our proposed model is trained through an end-to-end learning of
multiple generators via gating networks, which is responsible for choosing the
appropriate generator network for a given condition. We adopt the categorical
reparameterization trick for a categorical decision to be made in selecting a
generator while maintaining the flow of the gradients. We demonstrate that
individual generators learn different and salient subparts of the data and
achieve a multiscale structural similarity (MS-SSIM) score of 0.2470 for CelebA
and a competitive unsupervised inception score of 8.33 in CIFAR-10.Comment: 27th International Joint Conference on Artificial Intelligence (IJCAI
2018
Visual Prompting: Modifying Pixel Space to Adapt Pre-trained Models
Prompting has recently become a popular paradigm for adapting language models
to downstream tasks. Rather than fine-tuning model parameters or adding
task-specific heads, this approach steers a model to perform a new task simply
by adding a text prompt to the model's inputs. In this paper, we explore the
question: can we create prompts with pixels instead? In other words, can
pre-trained vision models be adapted to a new task solely by adding pixels to
their inputs? We introduce visual prompting, which learns a task-specific image
perturbation such that a frozen pre-trained model prompted with this
perturbation performs a new task. We discover that changing only a few pixels
is enough to adapt models to new tasks and datasets, and performs on par with
linear probing, the current de facto approach to lightweight adaptation. The
surprising effectiveness of visual prompting provides a new perspective on how
to adapt pre-trained models in vision, and opens up the possibility of adapting
models solely through their inputs, which, unlike model parameters or outputs,
are typically under an end-user's control. Code is available at
http://hjbahng.github.io/visual_prompting .Comment: 17 pages, 10 figure