24 research outputs found

    Generalization from correlated sets of patterns in the perceptron

    Full text link
    Generalization is a central aspect of learning theory. Here, we propose a framework that explores an auxiliary task-dependent notion of generalization, and attempts to quantitatively answer the following question: given two sets of patterns with a given degree of dissimilarity, how easily will a network be able to "unify" their interpretation? This is quantified by the volume of the configurations of synaptic weights that classify the two sets in a similar manner. To show the applicability of our idea in a concrete setting, we compute this quantity for the perceptron, a simple binary classifier, using the classical statistical physics approach in the replica-symmetric ansatz. In this case, we show how an analytical expression measures the "distance-based capacity", the maximum load of patterns sustainable by the network, at fixed dissimilarity between patterns and fixed allowed number of errors. This curve indicates that generalization is possible at any distance, but with decreasing capacity. We propose that a distance-based definition of generalization may be useful in numerical experiments with real-world neural networks, and to explore computationally sub-dominant sets of synaptic solutions

    A practical Bayesian framework for backpropagation networks

    Get PDF
    A quantitative and practical Bayesian framework is described for learning of mappings in feedforward networks. The framework makes possible (1) objective comparisons between solutions using alternative network architectures, (2) objective stopping rules for network pruning or growing procedures, (3) objective choice of magnitude and type of weight decay terms or additive regularizers (for penalizing large weights, etc.), (4) a measure of the effective number of well-determined parameters in a model, (5) quantified estimates of the error bars on network parameters and on network output, and (6) objective comparisons with alternative learning and interpolation models such as splines and radial basis functions. The Bayesian "evidence" automatically embodies "Occam's razor," penalizing overflexible and overcomplex models. The Bayesian approach helps detect poor underlying assumptions in learning models. For learning models well matched to a problem, a good correlation between generalization ability and the Bayesian evidence is obtained

    Linear and Order Statistics Combiners for Pattern Classification

    Full text link
    Several researchers have experimentally shown that substantial improvements can be obtained in difficult pattern recognition problems by combining or integrating the outputs of multiple classifiers. This chapter provides an analytical framework to quantify the improvements in classification results due to combining. The results apply to both linear combiners and order statistics combiners. We first show that to a first order approximation, the error rate obtained over and above the Bayes error rate, is directly proportional to the variance of the actual decision boundaries around the Bayes optimum boundary. Combining classifiers in output space reduces this variance, and hence reduces the "added" error. If N unbiased classifiers are combined by simple averaging, the added error rate can be reduced by a factor of N if the individual errors in approximating the decision boundaries are uncorrelated. Expressions are then derived for linear combiners which are biased or correlated, and the effect of output correlations on ensemble performance is quantified. For order statistics based non-linear combiners, we derive expressions that indicate how much the median, the maximum and in general the ith order statistic can improve classifier performance. The analysis presented here facilitates the understanding of the relationships among error rates, classifier boundary distributions, and combining in output space. Experimental results on several public domain data sets are provided to illustrate the benefits of combining and to support the analytical results.Comment: 31 page

    Learning in Single Hidden Layer Feedforward Network Models: Backpropagation in a Real World Application

    Get PDF
    Leaming in neural networks has attracted considerable interest in recent years. Our focus is on learning in single hidden layer feedforward networks which is posed as a search in the network parameter space for a network that minimizes an additive error function of statistically independent examples. In this contribution, we review first the class of single hidden layer feedforward networks and characterize the learning process in such networks from a statistical point of view. Then we describe the backpropagation procedure, the leading case of gradient descent learning algorithms for the class of networks considered here, as well as an efficient heuristic modification. Finally, we analyse the applicability of these learning methods to the problem of predicting interregional telecommunication flows. Particular emphasis is laid on the engineering judgment, first, in choosing appropriate values for the tunable parameters, second, on the decision whether to train the network by epoch or by pattern (random approximation), and, third, on the overfitting problem. In addition, the analysis shows that the neural network model whether using either epoch-based or pattern-based stochastic approximation outperforms the classical regression approach to modelling telecommunication flows. (authors' abstract)Series: Discussion Papers of the Institute for Economic Geography and GIScienc

    Artificial Neural Networks. A New Approach to Modelling Interregional Telecommunication Flows

    Get PDF
    During the last thirty years there has been much research effort in regional science devoted to modelling interactions over geographic space. Theoretical approaches for studying these phenomena have been modified considerably. This paper suggests a 'new modelling approach, based upon a general nested sigmoid neural network model. Its feasibility is illustrated in the context of modelling interregional telecommunication traffic in Austria and its performance is evaluated in comparison with the classical regression approach of the gravity type. The application of this neural network approach may be viewed as a three-stage process. The first stage refers to the identification of an appropriate network from the family of two-layered feedforward networks with 3 input nodes, one layer of (sigmoidal) intermediate nodes and one (sigmoidal) output node (logistic activation function). There is no general procedure to address this problem. We solved this issue experimentally. The input-output dimensions have been chosen in order to make the comparison with the gravity model as close as possible. The second stage involves the estimation of the network parameters of the selected neural network model. This is perlormed via the adaptive setting of the network parameters (training, estimation) by means of the application of a least mean squared error goal and the error back propagating technique, a recursive learning procedure using a gradient search to minimize the error goal. Particular emphasis is laid on the sensitivity of the network perlormance to the choice of the initial network parameters as well as on the problem of overlitting. The final stage of applying the neural network approach refers to the testing of the interregional teletraffic flows predicted. Prediction quality is analysed by means of two perlormance measures, average relative variance and the coefficient of determination, as well as by the use of residual analysis. The analysis shows that the neural network model approach outperlorms the classical regression approach to modelling telecommunication traffic in Austria. (authors' abstract)Series: Discussion Papers of the Institute for Economic Geography and GIScienc

    Repulsive Deep Ensembles are Bayesian

    Full text link
    Deep ensembles have recently gained popularity in the deep learning community for their conceptual simplicity and efficiency. However, maintaining functional diversity between ensemble members that are independently trained with gradient descent is challenging. This can lead to pathologies when adding more ensemble members, such as a saturation of the ensemble performance, which converges to the performance of a single model. Moreover, this does not only affect the quality of its predictions, but even more so the uncertainty estimates of the ensemble, and thus its performance on out-of-distribution data. We hypothesize that this limitation can be overcome by discouraging different ensemble members from collapsing to the same function. To this end, we introduce a kernelized repulsive term in the update rule of the deep ensembles. We show that this simple modification not only enforces and maintains diversity among the members but, even more importantly, transforms the maximum a posteriori inference into proper Bayesian inference. Namely, we show that the training dynamics of our proposed repulsive ensembles follow a Wasserstein gradient flow of the KL divergence with the true posterior. We study repulsive terms in weight and function space and empirically compare their performance to standard ensembles and Bayesian baselines on synthetic and real-world prediction tasks

    Intelligent flight control systems

    Get PDF
    The capabilities of flight control systems can be enhanced by designing them to emulate functions of natural intelligence. Intelligent control functions fall in three categories. Declarative actions involve decision-making, providing models for system monitoring, goal planning, and system/scenario identification. Procedural actions concern skilled behavior and have parallels in guidance, navigation, and adaptation. Reflexive actions are spontaneous, inner-loop responses for control and estimation. Intelligent flight control systems learn knowledge of the aircraft and its mission and adapt to changes in the flight environment. Cognitive models form an efficient basis for integrating 'outer-loop/inner-loop' control functions and for developing robust parallel-processing algorithms
    corecore