23 research outputs found
Asymptotic learning curves of kernel methods: empirical data v.s. Teacher-Student paradigm
How many training data are needed to learn a supervised task? It is often
observed that the generalization error decreases as where is
the number of training examples and an exponent that depends on both
data and algorithm. In this work we measure when applying kernel
methods to real datasets. For MNIST we find and for CIFAR10
, for both regression and classification tasks, and for
Gaussian or Laplace kernels. To rationalize the existence of non-trivial
exponents that can be independent of the specific kernel used, we study the
Teacher-Student framework for kernels. In this scheme, a Teacher generates data
according to a Gaussian random field, and a Student learns them via kernel
regression. With a simplifying assumption -- namely that the data are sampled
from a regular lattice -- we derive analytically for translation
invariant kernels, using previous results from the kriging literature. Provided
that the Student is not too sensitive to high frequencies, depends only
on the smoothness and dimension of the training data. We confirm numerically
that these predictions hold when the training points are sampled at random on a
hypersphere. Overall, the test error is found to be controlled by the magnitude
of the projection of the true function on the kernel eigenvectors whose rank is
larger than . Using this idea we predict relate the exponent to an
exponent describing how the coefficients of the true function in the
eigenbasis of the kernel decay with rank. We extract from real data by
performing kernel PCA, leading to for MNIST and
for CIFAR10, in good agreement with observations. We argue
that these rather large exponents are possible due to the small effective
dimension of the data.Comment: We added (i) the prediction of the exponent for real data
using kernel PCA; (ii) the generalization of our results to non-Gaussian data
from reference [11] (Bordelon et al., "Spectrum Dependent Learning Curves in
Kernel Regression and Wide Neural Networks"
An analytic theory of shallow networks dynamics for hinge loss classification
Neural networks have been shown to perform incredibly well in classification
tasks over structured high-dimensional datasets. However, the learning dynamics
of such networks is still poorly understood. In this paper we study in detail
the training dynamics of a simple type of neural network: a single hidden layer
trained to perform a classification task. We show that in a suitable mean-field
limit this case maps to a single-node learning problem with a time-dependent
dataset determined self-consistently from the average nodes population. We
specialize our theory to the prototypical case of a linearly separable dataset
and a linear hinge loss, for which the dynamics can be explicitly solved. This
allow us to address in a simple setting several phenomena appearing in modern
networks such as slowing down of training dynamics, crossover between rich and
lazy learning, and overfitting. Finally, we asses the limitations of mean-field
theory by studying the case of large but finite number of nodes and of training
samples.Comment: 16 pages, 6 figure
Physics-informed radial basis network (PIRBN): A local approximation neural network for solving nonlinear PDEs
Our recent intensive study has found that physics-informed neural networks
(PINN) tend to be local approximators after training. This observation leads to
this novel physics-informed radial basis network (PIRBN), which can maintain
the local property throughout the entire training process. Compared to deep
neural networks, a PIRBN comprises of only one hidden layer and a radial basis
"activation" function. Under appropriate conditions, we demonstrated that the
training of PIRBNs using gradient descendent methods can converge to Gaussian
processes. Besides, we studied the training dynamics of PIRBN via the neural
tangent kernel (NTK) theory. In addition, comprehensive investigations
regarding the initialisation strategies of PIRBN were conducted. Based on
numerical examples, PIRBN has been demonstrated to be more effective and
efficient than PINN in solving PDEs with high-frequency features and ill-posed
computational domains. Moreover, the existing PINN numerical techniques, such
as adaptive learning, decomposition and different types of loss functions, are
applicable to PIRBN. The programs that can regenerate all numerical results can
be found at https://github.com/JinshuaiBai/PIRBN.Comment: 48 pages, 26 figure