16,337 research outputs found

    A practical Bayesian framework for backpropagation networks

    Get PDF
    A quantitative and practical Bayesian framework is described for learning of mappings in feedforward networks. The framework makes possible (1) objective comparisons between solutions using alternative network architectures, (2) objective stopping rules for network pruning or growing procedures, (3) objective choice of magnitude and type of weight decay terms or additive regularizers (for penalizing large weights, etc.), (4) a measure of the effective number of well-determined parameters in a model, (5) quantified estimates of the error bars on network parameters and on network output, and (6) objective comparisons with alternative learning and interpolation models such as splines and radial basis functions. The Bayesian "evidence" automatically embodies "Occam's razor," penalizing overflexible and overcomplex models. The Bayesian approach helps detect poor underlying assumptions in learning models. For learning models well matched to a problem, a good correlation between generalization ability and the Bayesian evidence is obtained

    The Anisotropic Noise in Stochastic Gradient Descent: Its Behavior of Escaping from Sharp Minima and Regularization Effects

    Full text link
    Understanding the behavior of stochastic gradient descent (SGD) in the context of deep neural networks has raised lots of concerns recently. Along this line, we study a general form of gradient based optimization dynamics with unbiased noise, which unifies SGD and standard Langevin dynamics. Through investigating this general optimization dynamics, we analyze the behavior of SGD on escaping from minima and its regularization effects. A novel indicator is derived to characterize the efficiency of escaping from minima through measuring the alignment of noise covariance and the curvature of loss function. Based on this indicator, two conditions are established to show which type of noise structure is superior to isotropic noise in term of escaping efficiency. We further show that the anisotropic noise in SGD satisfies the two conditions, and thus helps to escape from sharp and poor minima effectively, towards more stable and flat minima that typically generalize well. We systematically design various experiments to verify the benefits of the anisotropic noise, compared with full gradient descent plus isotropic diffusion (i.e. Langevin dynamics).Comment: ICML 2019 camera read

    Generalization of color by chickens: experimental observations and a Bayesian model

    Get PDF
    Sensory generalization influences animals' responses to novel stimuli. Because color forms a perceptual continuum, it is a good subject for studying generalization. Moreover, because different causes of variation in spectral signals, such as pigmentation, gloss, and illumination, have differing behavioral significance, it may be beneficial to have adaptable generalization. We report on generalization by poultry chicks following differential training to rewarded (T+) and unrewarded (Tāˆ’) colors, in particular on the phenomenon of peak shift, which leads to subjects preferring stimuli displaced away from Tāˆ’. The first three experiments test effects of learning either a fine or a coarse discrimination. In experiments 1 and 2, peak shift occurs, but contrary to some predictions, the shift is smaller after the animal learned a fine discrimination than after it learned a coarse discrimination. Experiment 3 finds a similar effect for generalization on a color axis orthogonal to that separating T+ from Tāˆ’. Experiment 4 shows that generalization is rapidly modified by experience. These results imply that the scale of a ā€œperceptual rulerā€ is set by experience. We show that the observations are consistent with generalization following principles of Bayesian inference, which forms a powerful framework for understanding this type of behavior
    • ā€¦
    corecore