374 research outputs found
Stabilizing Training of Generative Adversarial Networks through Regularization
Deep generative models based on Generative Adversarial Networks (GANs) have
demonstrated impressive sample quality but in order to work they require a
careful choice of architecture, parameter initialization, and selection of
hyper-parameters. This fragility is in part due to a dimensional mismatch or
non-overlapping support between the model distribution and the data
distribution, causing their density ratio and the associated f-divergence to be
undefined. We overcome this fundamental limitation and propose a new
regularization approach with low computational cost that yields a stable GAN
training procedure. We demonstrate the effectiveness of this regularizer across
several architectures trained on common benchmark image generation tasks. Our
regularization turns GAN models into reliable building blocks for deep
learning
Bayesian Compression for Deep Learning
Compression and computational efficiency in deep learning have become a
problem of great significance. In this work, we argue that the most principled
and effective way to attack this problem is by adopting a Bayesian point of
view, where through sparsity inducing priors we prune large parts of the
network. We introduce two novelties in this paper: 1) we use hierarchical
priors to prune nodes instead of individual weights, and 2) we use the
posterior uncertainties to determine the optimal fixed point precision to
encode the weights. Both factors significantly contribute to achieving the
state of the art in terms of compression rates, while still staying competitive
with methods designed to optimize for speed or energy efficiency.Comment: Published as a conference paper at NIPS 201
Function-space regularized R\'enyi divergences
We propose a new family of regularized R\'enyi divergences parametrized not
only by the order but also by a variational function space. These new
objects are defined by taking the infimal convolution of the standard R\'enyi
divergence with the integral probability metric (IPM) associated with the
chosen function space. We derive a novel dual variational representation that
can be used to construct numerically tractable divergence estimators. This
representation avoids risk-sensitive terms and therefore exhibits lower
variance, making it well-behaved when ; this addresses a notable
weakness of prior approaches. We prove several properties of these new
divergences, showing that they interpolate between the classical R\'enyi
divergences and IPMs. We also study the limit, which leads to
a regularized worst-case-regret and a new variational representation in the
classical case. Moreover, we show that the proposed regularized R\'enyi
divergences inherit features from IPMs such as the ability to compare
distributions that are not absolutely continuous, e.g., empirical measures and
distributions with low-dimensional support. We present numerical results on
both synthetic and real datasets, showing the utility of these new divergences
in both estimation and GAN training applications; in particular, we demonstrate
significantly reduced variance and improved training performance.Comment: 24 pages, 4 figure
Hierarchical Semi-Implicit Variational Inference with Application to Diffusion Model Acceleration
Semi-implicit variational inference (SIVI) has been introduced to expand the
analytical variational families by defining expressive semi-implicit
distributions in a hierarchical manner. However, the single-layer architecture
commonly used in current SIVI methods can be insufficient when the target
posterior has complicated structures. In this paper, we propose hierarchical
semi-implicit variational inference, called HSIVI, which generalizes SIVI to
allow more expressive multi-layer construction of semi-implicit distributions.
By introducing auxiliary distributions that interpolate between a simple base
distribution and the target distribution, the conditional layers can be trained
by progressively matching these auxiliary distributions one layer after
another. Moreover, given pre-trained score networks, HSIVI can be used to
accelerate the sampling process of diffusion models with the score matching
objective. We show that HSIVI significantly enhances the expressiveness of SIVI
on several Bayesian inference problems with complicated target distributions.
When used for diffusion model acceleration, we show that HSIVI can produce high
quality samples comparable to or better than the existing fast diffusion model
based samplers with a small number of function evaluations on various datasets.Comment: 25 pages, 13 figures, NeurIPS 202
- …