24 research outputs found
Generalization from correlated sets of patterns in the perceptron
Generalization is a central aspect of learning theory. Here, we propose a
framework that explores an auxiliary task-dependent notion of generalization,
and attempts to quantitatively answer the following question: given two sets of
patterns with a given degree of dissimilarity, how easily will a network be
able to "unify" their interpretation? This is quantified by the volume of the
configurations of synaptic weights that classify the two sets in a similar
manner. To show the applicability of our idea in a concrete setting, we compute
this quantity for the perceptron, a simple binary classifier, using the
classical statistical physics approach in the replica-symmetric ansatz. In this
case, we show how an analytical expression measures the "distance-based
capacity", the maximum load of patterns sustainable by the network, at fixed
dissimilarity between patterns and fixed allowed number of errors. This curve
indicates that generalization is possible at any distance, but with decreasing
capacity. We propose that a distance-based definition of generalization may be
useful in numerical experiments with real-world neural networks, and to explore
computationally sub-dominant sets of synaptic solutions
A practical Bayesian framework for backpropagation networks
A quantitative and practical Bayesian framework is described for learning of mappings in feedforward networks. The framework makes possible (1) objective comparisons between solutions using alternative network architectures, (2) objective stopping rules for network pruning or growing procedures, (3) objective choice of magnitude and type of weight decay terms or additive regularizers (for penalizing large weights, etc.), (4) a measure of the effective number of well-determined parameters in a model, (5) quantified estimates of the error bars on network parameters and on network output, and (6) objective comparisons with alternative learning and interpolation models such as splines and radial basis functions. The Bayesian "evidence" automatically embodies "Occam's razor," penalizing overflexible and overcomplex models. The Bayesian approach helps detect poor underlying assumptions in learning models. For learning models well matched to a problem, a good correlation between generalization ability and the Bayesian evidence is obtained
Linear and Order Statistics Combiners for Pattern Classification
Several researchers have experimentally shown that substantial improvements
can be obtained in difficult pattern recognition problems by combining or
integrating the outputs of multiple classifiers. This chapter provides an
analytical framework to quantify the improvements in classification results due
to combining. The results apply to both linear combiners and order statistics
combiners. We first show that to a first order approximation, the error rate
obtained over and above the Bayes error rate, is directly proportional to the
variance of the actual decision boundaries around the Bayes optimum boundary.
Combining classifiers in output space reduces this variance, and hence reduces
the "added" error. If N unbiased classifiers are combined by simple averaging,
the added error rate can be reduced by a factor of N if the individual errors
in approximating the decision boundaries are uncorrelated. Expressions are then
derived for linear combiners which are biased or correlated, and the effect of
output correlations on ensemble performance is quantified. For order statistics
based non-linear combiners, we derive expressions that indicate how much the
median, the maximum and in general the ith order statistic can improve
classifier performance. The analysis presented here facilitates the
understanding of the relationships among error rates, classifier boundary
distributions, and combining in output space. Experimental results on several
public domain data sets are provided to illustrate the benefits of combining
and to support the analytical results.Comment: 31 page
Learning in Single Hidden Layer Feedforward Network Models: Backpropagation in a Real World Application
Leaming in neural networks has attracted considerable interest in recent years. Our focus is
on learning in single hidden layer feedforward networks which is posed as a search in the
network parameter space for a network that minimizes an additive error function of
statistically independent examples. In this contribution, we review first the class of single
hidden layer feedforward networks and characterize the learning process in such networks
from a statistical point of view. Then we describe the backpropagation procedure, the leading
case of gradient descent learning algorithms for the class of networks considered here, as
well as an efficient heuristic modification. Finally, we analyse the applicability of these
learning methods to the problem of predicting interregional telecommunication flows.
Particular emphasis is laid on the engineering judgment, first, in choosing appropriate
values for the tunable parameters, second, on the decision whether to train the network by
epoch or by pattern (random approximation), and, third, on the overfitting problem. In
addition, the analysis shows that the neural network model whether using either epoch-based
or pattern-based stochastic approximation outperforms the classical regression approach to
modelling telecommunication flows. (authors' abstract)Series: Discussion Papers of the Institute for Economic Geography and GIScienc
Artificial Neural Networks. A New Approach to Modelling Interregional Telecommunication Flows
During the last thirty years there has been much research effort in regional science
devoted to modelling interactions over geographic space. Theoretical approaches for
studying these phenomena have been modified considerably. This paper suggests a 'new
modelling approach, based upon a general nested sigmoid neural network model. Its
feasibility is illustrated in the context of modelling interregional telecommunication traffic in
Austria and its performance is evaluated in comparison with the classical regression
approach of the gravity type. The application of this neural network approach may be
viewed as a three-stage process. The first stage refers to the identification of an
appropriate network from the family of two-layered feedforward networks with 3 input
nodes, one layer of (sigmoidal) intermediate nodes and one (sigmoidal) output node
(logistic activation function). There is no general procedure to address this problem. We
solved this issue experimentally. The input-output dimensions have been chosen in order
to make the comparison with the gravity model as close as possible. The second stage
involves the estimation of the network parameters of the selected neural network model.
This is perlormed via the adaptive setting of the network parameters (training, estimation)
by means of the application of a least mean squared error goal and the error back
propagating technique, a recursive learning procedure using a gradient search to
minimize the error goal. Particular emphasis is laid on the sensitivity of the network
perlormance to the choice of the initial network parameters as well as on the problem of
overlitting. The final stage of applying the neural network approach refers to the testing of
the interregional teletraffic flows predicted. Prediction quality is analysed by means of two
perlormance measures, average relative variance and the coefficient of determination, as
well as by the use of residual analysis. The analysis shows that the neural network model
approach outperlorms the classical regression approach to modelling telecommunication
traffic in Austria. (authors' abstract)Series: Discussion Papers of the Institute for Economic Geography and GIScienc
Repulsive Deep Ensembles are Bayesian
Deep ensembles have recently gained popularity in the deep learning community
for their conceptual simplicity and efficiency. However, maintaining functional
diversity between ensemble members that are independently trained with gradient
descent is challenging. This can lead to pathologies when adding more ensemble
members, such as a saturation of the ensemble performance, which converges to
the performance of a single model. Moreover, this does not only affect the
quality of its predictions, but even more so the uncertainty estimates of the
ensemble, and thus its performance on out-of-distribution data. We hypothesize
that this limitation can be overcome by discouraging different ensemble members
from collapsing to the same function. To this end, we introduce a kernelized
repulsive term in the update rule of the deep ensembles. We show that this
simple modification not only enforces and maintains diversity among the members
but, even more importantly, transforms the maximum a posteriori inference into
proper Bayesian inference. Namely, we show that the training dynamics of our
proposed repulsive ensembles follow a Wasserstein gradient flow of the KL
divergence with the true posterior. We study repulsive terms in weight and
function space and empirically compare their performance to standard ensembles
and Bayesian baselines on synthetic and real-world prediction tasks
Intelligent flight control systems
The capabilities of flight control systems can be enhanced by designing them to emulate functions of natural intelligence. Intelligent control functions fall in three categories. Declarative actions involve decision-making, providing models for system monitoring, goal planning, and system/scenario identification. Procedural actions concern skilled behavior and have parallels in guidance, navigation, and adaptation. Reflexive actions are spontaneous, inner-loop responses for control and estimation. Intelligent flight control systems learn knowledge of the aircraft and its mission and adapt to changes in the flight environment. Cognitive models form an efficient basis for integrating 'outer-loop/inner-loop' control functions and for developing robust parallel-processing algorithms