Search CORE

673 research outputs found

On the Inductive Bias of Neural Tangent Kernels

Author: Bietti Alberto
Mairal Julien
Publication venue
Publication date: 31/10/2019
Field of study

State-of-the-art neural networks are heavily over-parameterized, making the optimization algorithm a crucial ingredient for learning predictive models with good generalization properties. A recent line of work has shown that in a certain over-parameterized regime, the learning dynamics of gradient descent are governed by a certain kernel obtained at initialization, called the neural tangent kernel. We study the inductive bias of learning in such a regime by analyzing this kernel and the corresponding function space (RKHS). In particular, we study smoothness, approximation, and stability properties of functions with finite norm, including stability to image deformations in the case of convolutional networks, and compare to other known kernels for similar architectures.Comment: NeurIPS 201

arXiv.org e-Print Archive

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

HAL-Rennes 1

A Convex Relaxation for Weakly Supervised Classifiers

Author: Bach Francis
Joulin Armand
Publication venue
Publication date: 27/06/2012
Field of study

This paper introduces a general multi-class approach to weakly supervised classification. Inferring the labels and learning the parameters of the model is usually done jointly through a block-coordinate descent algorithm such as expectation-maximization (EM), which may lead to local minima. To avoid this problem, we propose a cost function based on a convex relaxation of the soft-max loss. We then propose an algorithm specifically designed to efficiently solve the corresponding semidefinite program (SDP). Empirically, our method compares favorably to standard ones on different datasets for multiple instance learning and semi-supervised learning as well as on clustering tasks.Comment: Appears in Proceedings of the 29th International Conference on Machine Learning (ICML 2012

arXiv.org e-Print Archive

CiteSeerX

INRIA a CCSD electronic archive server

Unified Convergence Theory of Stochastic and Variance-Reduced Cubic Newton Methods

Author: Chayti El Mahdi
Doikov Nikita
Jaggi Martin
Publication venue
Publication date: 06/09/2023
Field of study

We study stochastic Cubic Newton methods for solving general possibly non-convex minimization problems. We propose a new framework, which we call the helper framework, that provides a unified view of the stochastic and variance-reduced second-order algorithms equipped with global complexity guarantees. It can also be applied to learning with auxiliary information. Our helper framework offers the algorithm designer high flexibility for constructing and analyzing the stochastic Cubic Newton methods, allowing arbitrary size batches, and the use of noisy and possibly biased estimates of the gradients and Hessians, incorporating both the variance reduction and the lazy Hessian updates. We recover the best-known complexities for the stochastic and variance-reduced Cubic Newton, under weak assumptions on the noise. A direct consequence of our theory is the new lazy stochastic second-order method, which significantly improves the arithmetic complexity for large dimension problems. We also establish complexity bounds for the classes of gradient-dominated objectives, that include convex and strongly convex problems. For Auxiliary Learning, we show that using a helper (auxiliary function) can outperform training alone if a given similarity measure is small

arXiv.org e-Print Archive

Disentangling feature and lazy training in deep neural networks

Author: Geiger Mario
Jacot Arthur
Spigler Stefano
Wyart Matthieu
Publication venue: 'IOP Publishing'
Publication date: 04/10/2020
Field of study

Two distinct limits for deep learning have been derived as the network width

h\rightarrow \infty

, depending on how the weights of the last layer scale with

h

. In the Neural Tangent Kernel (NTK) limit, the dynamics becomes linear in the weights and is described by a frozen kernel

\Theta

. By contrast, in the Mean-Field limit, the dynamics can be expressed in terms of the distribution of the parameters associated with a neuron, that follows a partial differential equation. In this work we consider deep networks where the weights in the last layer scale as

\alpha h^{-1/2}

at initialization. By varying

\alpha

and

h

, we probe the crossover between the two limits. We observe the previously identified regimes of lazy training and feature training. In the lazy-training regime, the dynamics is almost linear and the NTK barely changes after initialization. The feature-training regime includes the mean-field formulation as a limiting case and is characterized by a kernel that evolves in time, and learns some features. We perform numerical experiments on MNIST, Fashion-MNIST, EMNIST and CIFAR10 and consider various architectures. We find that (i) The two regimes are separated by an

\alpha^*

that scales as

h^{-1/2}

. (ii) Network architecture and data structure play an important role in determining which regime is better: in our tests, fully-connected networks perform generally better in the lazy-training regime, unlike convolutional networks. (iii) In both regimes, the fluctuations

\delta F

induced on the learned function by initial conditions decay as

\delta F\sim 1/\sqrt{h}

, leading to a performance that increases with

h

. The same improvement can also be obtained at an intermediate width by ensemble-averaging several networks. (iv) In the feature-training regime we identify a time scale

t_1\sim\sqrt{h}\alpha

, such that for

t\ll t_1

the dynamics is linear.Comment: minor revision

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Learning explanatory logical rules in non-linear domains: a neuro-symbolic approach

Author: Belle Vaishak
Bueff Andreas
Publication venue
Publication date: 08/04/2024
Field of study

Deep neural networks, despite their capabilities, are constrained by the need for large-scale training data, and often fall short in generalisation and interpretability. Inductive logic programming (ILP) presents an intriguing solution with its data-efficient learning of first-order logic rules. However, ILP grapples with challenges, notably the handling of non-linearity in continuous domains. With the ascent of neuro-symbolic ILP, there’s a drive to mitigate these challenges, synergising deep learning with relational ILP models to enhance interpretability and create logical decision boundaries. In this research, we introduce a neuro-symbolic ILP framework, grounded on differentiable Neural Logic networks, tailored for non-linear rule extraction in mixed discrete-continuous spaces. Our methodology consists of a neuro-symbolic approach, emphasising the extraction of non-linear functions from mixed domain data. Our preliminary findings showcase our architecture’s capability to identify non-linear functions from continuous data, offering a new perspective in neural-symbolic research and underlining the adaptability of ILP-based frameworks for regression challenges in continuous scenarios

Edinburgh Research Explorer

On Lazy Training in Differentiable Programming

Author: Bach Francis
Chizat Lenaic
Oyallon Edouard
Publication venue: HAL CCSD
Publication date: 08/12/2019
Field of study

International audienceIn a series of recent theoretical works, it was shown that strongly over-parameterized neural networks trained with gradient-based methods could converge exponentially fast to zero training loss, with their parameters hardly varying. In this work, we show that this "lazy training" phenomenon is not specific to over-parameterized neural networks, and is due to a choice of scaling, often implicit, that makes the model behave as its linearization around the initialization, thus yielding a model equivalent to learning with positive-definite kernels. Through a theoretical analysis, we exhibit various situations where this phenomenon arises in non-convex optimization and we provide bounds on the distance between the lazy and linearized optimization paths. Our numerical experiments bring a critical note, as we observe that the performance of commonly used non-linear deep convolutional neural networks in computer vision degrades when trained in the lazy regime. This makes it unlikely that "lazy training" is behind the many successes of neural networks in difficult high dimensional tasks

INRIA a CCSD electronic archive server