Learning and generalization in radial basis function networks

Abstract

The aim of supervised learning is to approximate an unknown target function by adjusting the parameters of a learning model in response to possibly noisy examples generated by the target function. The performance of the learning model at this task can be quantified by examining its generalization ability. Initially the concept of generalization is reviewed, and various methods of measuring it, such as generalization error, prediction error, PAC learning and the evidence, are discussed and the relations between them examined. Some of these relations are dependent on the architecture of the learning model.Two architectures are prevalent in practical supervised learning: the multi -layer perceptron (MLP) and the radial basis function network (RBF). While the RBF has previously been examined from a worst -case perspective, this gives little insight into the performance and phenomena that can be expected in the typical case. This thesis focusses on the properties of learning and generalization that can be expected on average in the RBF.There are two methods in use for training the RBF. The basis functions can be fixed in advance, utilising an unsupervised learning algorithm, or can adapt during the training process. For the case in which the basis functions are fixed, the typical generalization error given a data set of particular size is calculated by employing the Bayesian framework. The effects of noisy data and regularization are examined, the optimal settings of the parameters that control the learning process are calculated, and the consequences of a mismatch between the learning model and the data -generating mechanism are demonstrated.The second case, in which the basis functions are adapted, is studied utilising the on -line learning paradigm. The average evolution of generalization error is calculated in a manner which allows the phenomena of the learning process, such as the specialization of the basis functions, to be eludicated. The three most important stages of training: the symmetric phase, the symmetry- breaking phase and the convergence phase, are analyzed in detail; the convergence phase analysis allows the derivation of maximal and optimal learning rates. Noise on both the inputs and outputs of the data -generating mechanism is introduced, and the consequences examined. Regularization via weight decay is also studied, as are the effects of the learning model being poorly matched to the data generator

    Similar works