42,379 research outputs found
Informed MCMC with Bayesian Neural Networks for Facial Image Analysis
Computer vision tasks are difficult because of the large variability in the
data that is induced by changes in light, background, partial occlusion as well
as the varying pose, texture, and shape of objects. Generative approaches to
computer vision allow us to overcome this difficulty by explicitly modeling the
physical image formation process. Using generative object models, the analysis
of an observed image is performed via Bayesian inference of the posterior
distribution. This conceptually simple approach tends to fail in practice
because of several difficulties stemming from sampling the posterior
distribution: high-dimensionality and multi-modality of the posterior
distribution as well as expensive simulation of the rendering process. The main
difficulty of sampling approaches in a computer vision context is choosing the
proposal distribution accurately so that maxima of the posterior are explored
early and the algorithm quickly converges to a valid image interpretation. In
this work, we propose to use a Bayesian Neural Network for estimating an image
dependent proposal distribution. Compared to a standard Gaussian random walk
proposal, this accelerates the sampler in finding regions of the posterior with
high value. In this way, we can significantly reduce the number of samples
needed to perform facial image analysis.Comment: Accepted to the Bayesian Deep Learning Workshop at NeurIPS 201
LayoutDETR: Detection Transformer Is a Good Multimodal Layout Designer
Graphic layout designs play an essential role in visual communication. Yet
handcrafting layout designs is skill-demanding, time-consuming, and
non-scalable to batch production. Generative models emerge to make design
automation scalable but it remains non-trivial to produce designs that comply
with designers' multimodal desires, i.e., constrained by background images and
driven by foreground content. We propose LayoutDETR that inherits the high
quality and realism from generative modeling, while reformulating content-aware
requirements as a detection problem: we learn to detect in a background image
the reasonable locations, scales, and spatial relations for multimodal
foreground elements in a layout. Our solution sets a new state-of-the-art
performance for layout generation on public benchmarks and on our newly-curated
ad banner dataset. We integrate our solution into a graphical system that
facilitates user studies, and show that users prefer our designs over baselines
by significant margins. Our code, models, dataset, graphical system, and demos
are available at https://github.com/salesforce/LayoutDETR
Learning Generative Models with Visual Attention
Attention has long been proposed by psychologists as important for
effectively dealing with the enormous sensory stimulus available in the
neocortex. Inspired by the visual attention models in computational
neuroscience and the need of object-centric data for generative models, we
describe for generative learning framework using attentional mechanisms.
Attentional mechanisms can propagate signals from region of interest in a scene
to an aligned canonical representation, where generative modeling takes place.
By ignoring background clutter, generative models can concentrate their
resources on the object of interest. Our model is a proper graphical model
where the 2D Similarity transformation is a part of the top-down process. A
ConvNet is employed to provide good initializations during posterior inference
which is based on Hamiltonian Monte Carlo. Upon learning images of faces, our
model can robustly attend to face regions of novel test subjects. More
importantly, our model can learn generative models of new faces from a novel
dataset of large images where the face locations are not known.Comment: In the proceedings of Neural Information Processing Systems, 201
Channel-Recurrent Autoencoding for Image Modeling
Despite recent successes in synthesizing faces and bedrooms, existing
generative models struggle to capture more complex image types, potentially due
to the oversimplification of their latent space constructions. To tackle this
issue, building on Variational Autoencoders (VAEs), we integrate recurrent
connections across channels to both inference and generation steps, allowing
the high-level features to be captured in global-to-local, coarse-to-fine
manners. Combined with adversarial loss, our channel-recurrent VAE-GAN
(crVAE-GAN) outperforms VAE-GAN in generating a diverse spectrum of high
resolution images while maintaining the same level of computational efficacy.
Our model produces interpretable and expressive latent representations to
benefit downstream tasks such as image completion. Moreover, we propose two
novel regularizations, namely the KL objective weighting scheme over time steps
and mutual information maximization between transformed latent variables and
the outputs, to enhance the training.Comment: Code: https://github.com/WendyShang/crVAE. Supplementary Materials:
http://www-personal.umich.edu/~shangw/wacv18_supplementary_material.pd
- …