Mixture models in variational inference (VI) is an active field of research.
Recent works have established their connection to multiple importance sampling
(MIS) through the MISELBO and advanced the use of ensemble approximations for
large-scale problems. However, as we show here, an independent learning of the
ensemble components can lead to suboptimal diversity. Hence, we study the
effect of instead using MISELBO as an objective function for learning mixtures,
and we propose the first ever mixture of variational approximations for a
normalizing flow-based hierarchical variational autoencoder (VAE) with
VampPrior and a PixelCNN decoder network. Two major insights led to the
construction of this novel composite model. First, mixture models have
potential to be off-the-shelf tools for practitioners to obtain more flexible
posterior approximations in VAEs. Therefore, we make them more accessible by
demonstrating how to apply them to four popular architectures. Second, the
mixture components cooperate in order to cover the target distribution while
trying to maximize their diversity when MISELBO is the objective function. We
explain this cooperative behavior by drawing a novel connection between VI and
adaptive importance sampling. Finally, we demonstrate the superiority of the
Mixture VAEs' learned feature representations on both image and single-cell
transcriptome data, and obtain state-of-the-art results among VAE architectures
in terms of negative log-likelihood on the MNIST and FashionMNIST datasets.
Code available here: \url{https://github.com/Lagergren-Lab/MixtureVAEs}