1 research outputs found
Unsupervised multi-modal Styled Content Generation
The emergence of deep generative models has recently enabled the automatic
generation of massive amounts of graphical content, both in 2D and in 3D.
Generative Adversarial Networks (GANs) and style control mechanisms, such as
Adaptive Instance Normalization (AdaIN), have proved particularly effective in
this context, culminating in the state-of-the-art StyleGAN architecture. While
such models are able to learn diverse distributions, provided a sufficiently
large training set, they are not well-suited for scenarios where the
distribution of the training data exhibits a multi-modal behavior. In such
cases, reshaping a uniform or normal distribution over the latent space into a
complex multi-modal distribution in the data domain is challenging, and the
generator might fail to sample the target distribution well. Furthermore,
existing unsupervised generative models are not able to control the mode of the
generated samples independently of the other visual attributes, despite the
fact that they are typically disentangled in the training data.
In this paper, we introduce UMMGAN, a novel architecture designed to better
model multi-modal distributions, in an unsupervised fashion. Building upon the
StyleGAN architecture, our network learns multiple modes, in a completely
unsupervised manner, and combines them using a set of learned weights. We
demonstrate that this approach is capable of effectively approximating a
complex distribution as a superposition of multiple simple ones. We further
show that UMMGAN effectively disentangles between modes and style, thereby
providing an independent degree of control over the generated content