414,908 research outputs found

    Phase transitions in soft-committee machines

    Get PDF
    Equilibrium statistical physics is applied to layered neural networks with differentiable activation functions. A first analysis of off-line learning in soft-committee machines with a finite number (K) of hidden units learning a perfectly matching rule is performed. Our results are exact in the limit of high training temperatures. For K=2 we find a second order phase transition from unspecialized to specialized student configurations at a critical size P of the training set, whereas for K > 2 the transition is first order. Monte Carlo simulations indicate that our results are also valid for moderately low temperatures qualitatively. The limit K to infinity can be performed analytically, the transition occurs after presenting on the order of N K examples. However, an unspecialized metastable state persists up to P= O (N K^2).Comment: 8 pages, 4 figure

    Learning and generalization theories of large committee--machines

    Full text link
    The study of the distribution of volumes associated to the internal representations of learning examples allows us to derive the critical learning capacity (αc=16πln⁥K\alpha_c=\frac{16}{\pi} \sqrt{\ln K}) of large committee machines, to verify the stability of the solution in the limit of a large number KK of hidden units and to find a Bayesian generalization cross--over at α=K\alpha=K.Comment: 14 pages, revte

    Statistical physics and practical training of soft-committee machines

    Full text link
    Equilibrium states of large layered neural networks with differentiable activation function and a single, linear output unit are investigated using the replica formalism. The quenched free energy of a student network with a very large number of hidden units learning a rule of perfectly matching complexity is calculated analytically. The system undergoes a first order phase transition from unspecialized to specialized student configurations at a critical size of the training set. Computer simulations of learning by stochastic gradient descent from a fixed training set demonstrate that the equilibrium results describe quantitatively the plateau states which occur in practical training procedures at sufficiently small but finite learning rates.Comment: 11 pages, 4 figure

    Finite size scaling in neural networks

    Full text link
    We demonstrate that the fraction of pattern sets that can be stored in single- and hidden-layer perceptrons exhibits finite size scaling. This feature allows to estimate the critical storage capacity \alpha_c from simulations of relatively small systems. We illustrate this approach by determining \alpha_c, together with the finite size scaling exponent \nu, for storing Gaussian patterns in committee and parity machines with binary couplings and up to K=5 hidden units.Comment: 4 pages, RevTex, 5 figures, uses multicol.sty and psfig.st

    Learning in ultrametric committee machines

    Get PDF
    The problem of learning by examples in ultrametric committee machines (UCMs) is studied within the framework of statistical mechanics. Using the replica formalism we calculate the average generalization error in UCMs with L hidden layers and for a large enough number of units. In most of the regimes studied we find that the generalization error, as a function of the number of examples presented, develops a discontinuous drop at a critical value of the load parameter. We also find that when L>1 a number of teacher networks with the same number of hidden layers and different overlaps induce learning processes with the same critical points

    Correlation of internal representations in feed-forward neural networks

    Full text link
    Feed-forward multilayer neural networks implementing random input-output mappings develop characteristic correlations between the activity of their hidden nodes which are important for the understanding of the storage and generalization performance of the network. It is shown how these correlations can be calculated from the joint probability distribution of the aligning fields at the hidden units for arbitrary decoder function between hidden layer and output. Explicit results are given for the parity-, and-, and committee-machines with arbitrary number of hidden nodes near saturation.Comment: 6 pages, latex, 1 figur

    Functional Optimisation of Online Algorithms in Multilayer Neural Networks

    Full text link
    We study the online dynamics of learning in fully connected soft committee machines in the student-teacher scenario. The locally optimal modulation function, which determines the learning algorithm, is obtained from a variational argument in such a manner as to maximise the average generalisation error decay per example. Simulations results for the resulting algorithm are presented for a few cases. The symmetric phase plateaux are found to be vastly reduced in comparison to those found when online backpropagation algorithms are used. A discussion of the implementation of these ideas as practical algorithms is given

    Storage capacity of ultrametric committee machines

    Get PDF
    The problem of computing the storage capacity of a feed-forward network, with L hidden layers, N inputs, and K units in the first hidden layer, is analyzed using techniques from statistical mechanics. We found that the storage capacity strongly depends on the network architecture αc ∌ (log K)1-1/2L and that the number of units K limits the number of possible hidden layers L through the relationship 2L - 1 < 2log K

    Computational capabilities of multilayer committee machines

    Get PDF
    We obtained an analytical expression for the computational complexity of many layered committee machines with a finite number of hidden layers (L < 8) using the generalization complexity measure introduced by Franco et al (2006) IEEE Trans. Neural Netw. 17 578. Although our result is valid in the large-size limit and for an overlap synaptic matrix that is ultrametric, it provides a useful tool for inferring the appropriate architecture a network must have to reproduce an arbitrary realizable Boolean function

    Weight Space Structure and Internal Representations: a Direct Approach to Learning and Generalization in Multilayer Neural Network

    Full text link
    We analytically derive the geometrical structure of the weight space in multilayer neural networks (MLN), in terms of the volumes of couplings associated to the internal representations of the training set. Focusing on the parity and committee machines, we deduce their learning and generalization capabilities both reinterpreting some known properties and finding new exact results. The relationship between our approach and information theory as well as the Mitchison--Durbin calculation is established. Our results are exact in the limit of a large number of hidden units, showing that MLN are a class of exactly solvable models with a simple interpretation of replica symmetry breaking.Comment: 12 pages, 1 compressed ps figure (uufile), RevTeX fil
    • 

    corecore