1,063 research outputs found
On the Inductive Bias of Neural Tangent Kernels
State-of-the-art neural networks are heavily over-parameterized, making the
optimization algorithm a crucial ingredient for learning predictive models with
good generalization properties. A recent line of work has shown that in a
certain over-parameterized regime, the learning dynamics of gradient descent
are governed by a certain kernel obtained at initialization, called the neural
tangent kernel. We study the inductive bias of learning in such a regime by
analyzing this kernel and the corresponding function space (RKHS). In
particular, we study smoothness, approximation, and stability properties of
functions with finite norm, including stability to image deformations in the
case of convolutional networks, and compare to other known kernels for similar
architectures.Comment: NeurIPS 201
On the Inductive Bias of Neural Tangent Kernels
International audienceState-of-the-art neural networks are heavily over-parameterized, making the optimization algorithm a crucial ingredient for learning predictive models with good generalization properties. A recent line of work has shown that in a certain over-parameterized regime, the learning dynamics of gradient descent are governed by a certain kernel obtained at initialization, called the neural tangent kernel. We study the inductive bias of learning in such a regime by analyzing this kernel and the corresponding function space (RKHS). In particular, we study smoothness, approximation, and stability properties of functions with finite norm, including stability to image deformations in the case of convolutional networks, and compare to other known kernels for similar architectures
Infinite Width Graph Neural Networks for Node Regression/ Classification
This work analyzes Graph Neural Networks, a generalization of Fully-Connected
Deep Neural Nets on Graph structured data, when their width, that is the number
of nodes in each fullyconnected layer is increasing to infinity. Infinite Width
Neural Networks are connecting Deep Learning to Gaussian Processes and Kernels,
both Machine Learning Frameworks with long traditions and extensive theoretical
foundations. Gaussian Processes and Kernels have much less hyperparameters then
Neural Networks and can be used for uncertainty estimation, making them more
user friendly for applications. This works extends the increasing amount of
research connecting Gaussian Processes and Kernels to Neural Networks. The
Kernel and Gaussian Process closed forms are derived for a variety of
architectures, namely the standard Graph Neural Network, the Graph Neural
Network with Skip-Concatenate Connections and the Graph Attention Neural
Network. All architectures are evaluated on a variety of datasets on the task
of transductive Node Regression and Classification. Additionally, a Spectral
Sparsification method known as Effective Resistance is used to improve runtime
and memory requirements. Extending the setting to inductive graph learning
tasks (Graph Regression/ Classification) is straightforward and is briefly
discussed in 3.5.Comment: 49 Pages, 2 Figures (with subfigures), multiple tables, v2: made
table of contents fit to one page and added derivatives on GAT*NTK and GAT*GP
in A.4, v3: shorten parts of introduction and fixed typos, added numberings
to equations and discussion section, v4: fix two missing citations on page 1
- …