49 research outputs found
A Stochastic Tensor Method for Non-convex Optimization
We present a stochastic optimization method that uses a fourth-order
regularized model to find local minima of smooth and potentially non-convex
objective functions with a finite-sum structure. This algorithm uses
sub-sampled derivatives instead of exact quantities. The proposed approach is
shown to find an -third-order critical
point in at most \bigO\left(\max\left(\epsilon_1^{-4/3}, \epsilon_2^{-2},
\epsilon_3^{-4}\right)\right) iterations, thereby matching the rate of
deterministic approaches. In order to prove this result, we derive a novel
tensor concentration inequality for sums of tensors of any order that makes
explicit use of the finite-sum structure of the objective function
Stabilizing Training of Generative Adversarial Networks through Regularization
Deep generative models based on Generative Adversarial Networks (GANs) have
demonstrated impressive sample quality but in order to work they require a
careful choice of architecture, parameter initialization, and selection of
hyper-parameters. This fragility is in part due to a dimensional mismatch or
non-overlapping support between the model distribution and the data
distribution, causing their density ratio and the associated f-divergence to be
undefined. We overcome this fundamental limitation and propose a new
regularization approach with low computational cost that yields a stable GAN
training procedure. We demonstrate the effectiveness of this regularizer across
several architectures trained on common benchmark image generation tasks. Our
regularization turns GAN models into reliable building blocks for deep
learning
Variance Reduced Stochastic Gradient Descent with Neighbors
Stochastic Gradient Descent (SGD) is a workhorse in machine learning, yet its
slow convergence can be a computational bottleneck. Variance reduction
techniques such as SAG, SVRG and SAGA have been proposed to overcome this
weakness, achieving linear convergence. However, these methods are either based
on computations of full gradients at pivot points, or on keeping per data point
corrections in memory. Therefore speed-ups relative to SGD may need a minimal
number of epochs in order to materialize. This paper investigates algorithms
that can exploit neighborhood structure in the training data to share and
re-use information about past stochastic gradients across data points, which
offers advantages in the transient optimization phase. As a side-product we
provide a unified convergence analysis for a family of variance reduction
algorithms, which we call memorization algorithms. We provide experimental
results supporting our theory.Comment: Appears in: Advances in Neural Information Processing Systems 28
(NIPS 2015). 13 page