A new variational autoencoder (VAE) model is proposed that learns a succinct
common representation of two correlated data variables for conditional and
joint generation tasks. The proposed Wyner VAE model is based on two
information theoretic problems---distributed simulation and channel
synthesis---in which Wyner's common information arises as the fundamental limit
of the succinctness of the common representation. The Wyner VAE decomposes a
pair of correlated data variables into their common representation (e.g., a
shared concept) and local representations that capture the remaining randomness
(e.g., texture and style) in respective data variables by imposing the mutual
information between the data variables and the common representation as a
regularization term. The utility of the proposed approach is demonstrated
through experiments for joint and conditional generation with and without style
control using synthetic data and real images. Experimental results show that
learning a succinct common representation achieves better generative
performance and that the proposed model outperforms existing VAE variants and
the variational information bottleneck method.Comment: 24 pages, 18 figure