Search CORE

47 research outputs found

OOGAN: Disentangling GAN with One-Hot Sampling and Orthogonal Regularization

Author: de Melo Gerard
Elgammal Ahmed
Fu Zuohui
Liu Bingchen
Zhu Yizhe
Publication venue
Publication date: 10/03/2020
Field of study

Exploring the potential of GANs for unsupervised disentanglement learning, this paper proposes a novel GAN-based disentanglement framework with One-Hot Sampling and Orthogonal Regularization (OOGAN). While previous works mostly attempt to tackle disentanglement learning through VAE and seek to implicitly minimize the Total Correlation (TC) objective with various sorts of approximation methods, we show that GANs have a natural advantage in disentangling with an alternating latent variable (noise) sampling method that is straightforward and robust. Furthermore, we provide a brand-new perspective on designing the structure of the generator and discriminator, demonstrating that a minor structural change and an orthogonal regularization on model weights entails an improved disentanglement. Instead of experimenting on simple toy datasets, we conduct experiments on higher-resolution images and show that OOGAN greatly pushes the boundary of unsupervised disentanglement.Comment: AAAI 202

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Asymptotic learning curves of kernel methods: empirical data v.s. Teacher-Student paradigm

Author: Geiger Mario
Spigler Stefano
Wyart Matthieu
Publication venue: 'IOP Publishing'
Publication date: 18/08/2020
Field of study

How many training data are needed to learn a supervised task? It is often observed that the generalization error decreases as

n^{-\beta}

where

n

is the number of training examples and

\beta

an exponent that depends on both data and algorithm. In this work we measure

\beta

when applying kernel methods to real datasets. For MNIST we find

\beta\approx 0.4

and for CIFAR10

\beta\approx 0.1

, for both regression and classification tasks, and for Gaussian or Laplace kernels. To rationalize the existence of non-trivial exponents that can be independent of the specific kernel used, we study the Teacher-Student framework for kernels. In this scheme, a Teacher generates data according to a Gaussian random field, and a Student learns them via kernel regression. With a simplifying assumption -- namely that the data are sampled from a regular lattice -- we derive analytically

\beta

for translation invariant kernels, using previous results from the kriging literature. Provided that the Student is not too sensitive to high frequencies,

\beta

depends only on the smoothness and dimension of the training data. We confirm numerically that these predictions hold when the training points are sampled at random on a hypersphere. Overall, the test error is found to be controlled by the magnitude of the projection of the true function on the kernel eigenvectors whose rank is larger than

n

. Using this idea we predict relate the exponent

\beta

to an exponent

a

describing how the coefficients of the true function in the eigenbasis of the kernel decay with rank. We extract

a

from real data by performing kernel PCA, leading to

\beta\approx0.36

for MNIST and

\beta\approx0.07

for CIFAR10, in good agreement with observations. We argue that these rather large exponents are possible due to the small effective dimension of the data.Comment: We added (i) the prediction of the exponent

\beta

for real data using kernel PCA; (ii) the generalization of our results to non-Gaussian data from reference [11] (Bordelon et al., "Spectrum Dependent Learning Curves in Kernel Regression and Wide Neural Networks"

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Mutual Information of Neural Network Initialisations: Mean Field Approximations

Author: Tanner Jared
Ughi Giuseppe
Publication venue
Publication date: 08/02/2021
Field of study

The ability to train randomly initialised deep neural networks is known to depend strongly on the variance of the weight matrices and biases as well as the choice of nonlinear activation. Here we complement the existing geometric analysis of this phenomenon with an information theoretic alternative. Lower bounds are derived for the mutual information between an input and hidden layer outputs. Using a mean field analysis we are able to provide analytic lower bounds as functions of network weight and bias variances as well as the choice of nonlinear activation. These results show that initialisations known to be optimal from a training point of view are also superior from a mutual information perspective

arXiv.org e-Print Archive

Oxford University Research Archive

The adaptive interpolation method for proving replica formulas. Applications to the Curie-Weiss and Wigner spike models

Author: Barbier Jean
Macris Nicolas
Publication venue: 'IOP Publishing'
Publication date: 11/07/2019
Field of study

In this contribution we give a pedagogic introduction to the newly introduced adaptive interpolation method to prove in a simple and unified way replica formulas for Bayesian optimal inference problems. Many aspects of this method can already be explained at the level of the simple Curie-Weiss spin system. This provides a new method of solution for this model which does not appear to be known. We then generalize this analysis to a paradigmatic inference problem, namely rank-one matrix estimation, also refered to as the Wigner spike model in statistics. We give many pointers to the recent literature where the method has been succesfully applied

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne