1,978 research outputs found
Channel-Recurrent Autoencoding for Image Modeling
Despite recent successes in synthesizing faces and bedrooms, existing
generative models struggle to capture more complex image types, potentially due
to the oversimplification of their latent space constructions. To tackle this
issue, building on Variational Autoencoders (VAEs), we integrate recurrent
connections across channels to both inference and generation steps, allowing
the high-level features to be captured in global-to-local, coarse-to-fine
manners. Combined with adversarial loss, our channel-recurrent VAE-GAN
(crVAE-GAN) outperforms VAE-GAN in generating a diverse spectrum of high
resolution images while maintaining the same level of computational efficacy.
Our model produces interpretable and expressive latent representations to
benefit downstream tasks such as image completion. Moreover, we propose two
novel regularizations, namely the KL objective weighting scheme over time steps
and mutual information maximization between transformed latent variables and
the outputs, to enhance the training.Comment: Code: https://github.com/WendyShang/crVAE. Supplementary Materials:
http://www-personal.umich.edu/~shangw/wacv18_supplementary_material.pd
C4Synth: Cross-Caption Cycle-Consistent Text-to-Image Synthesis
Generating an image from its description is a challenging task worth solving
because of its numerous practical applications ranging from image editing to
virtual reality. All existing methods use one single caption to generate a
plausible image. A single caption by itself, can be limited, and may not be
able to capture the variety of concepts and behavior that may be present in the
image. We propose two deep generative models that generate an image by making
use of multiple captions describing it. This is achieved by ensuring
'Cross-Caption Cycle Consistency' between the multiple captions and the
generated image(s). We report quantitative and qualitative results on the
standard Caltech-UCSD Birds (CUB) and Oxford-102 Flowers datasets to validate
the efficacy of the proposed approach.Comment: To appear in the proceedings of IEEE Winter Conference on
Applications of Computer Vision, WACV-201
- …