Search CORE

49 research outputs found

A Stochastic Tensor Method for Non-convex Optimization

Author: Kohler Jonas
Lucchi Aurelien
Publication venue
Publication date: 13/11/2020
Field of study

We present a stochastic optimization method that uses a fourth-order regularized model to find local minima of smooth and potentially non-convex objective functions with a finite-sum structure. This algorithm uses sub-sampled derivatives instead of exact quantities. The proposed approach is shown to find an

(\epsilon_1,\epsilon_2,\epsilon_3)

-third-order critical point in at most \bigO\left(\max\left(\epsilon_1^{-4/3}, \epsilon_2^{-2}, \epsilon_3^{-4}\right)\right) iterations, thereby matching the rate of deterministic approaches. In order to prove this result, we derive a novel tensor concentration inequality for sums of tensors of any order that makes explicit use of the finite-sum structure of the objective function

arXiv.org e-Print Archive

Stabilizing Training of Generative Adversarial Networks through Regularization

Author: Hofmann Thomas
Lucchi Aurelien
Nowozin Sebastian
Roth Kevin
Publication venue
Publication date: 07/11/2017
Field of study

Deep generative models based on Generative Adversarial Networks (GANs) have demonstrated impressive sample quality but in order to work they require a careful choice of architecture, parameter initialization, and selection of hyper-parameters. This fragility is in part due to a dimensional mismatch or non-overlapping support between the model distribution and the data distribution, causing their density ratio and the associated f-divergence to be undefined. We overcome this fundamental limitation and propose a new regularization approach with low computational cost that yields a stable GAN training procedure. We demonstrate the effectiveness of this regularizer across several architectures trained on common benchmark image generation tasks. Our regularization turns GAN models into reliable building blocks for deep learning

arXiv.org e-Print Archive

Repository for Publications and Research Data

Variance Reduced Stochastic Gradient Descent with Neighbors

Author: Hofmann Thomas
Lacoste-Julien Simon
Lucchi Aurelien
McWilliams Brian
Publication venue
Publication date: 01/12/2015
Field of study

Stochastic Gradient Descent (SGD) is a workhorse in machine learning, yet its slow convergence can be a computational bottleneck. Variance reduction techniques such as SAG, SVRG and SAGA have been proposed to overcome this weakness, achieving linear convergence. However, these methods are either based on computations of full gradients at pivot points, or on keeping per data point corrections in memory. Therefore speed-ups relative to SGD may need a minimal number of epochs in order to materialize. This paper investigates algorithms that can exploit neighborhood structure in the training data to share and re-use information about past stochastic gradients across data points, which offers advantages in the transient optimization phase. As a side-product we provide a unified convergence analysis for a family of variance reduction algorithms, which we call memorization algorithms. We provide experimental results supporting our theory.Comment: Appears in: Advances in Neural Information Processing Systems 28 (NIPS 2015). 13 page

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server