16,337 research outputs found
A practical Bayesian framework for backpropagation networks
A quantitative and practical Bayesian framework is described for learning of mappings in feedforward networks. The framework makes possible (1) objective comparisons between solutions using alternative network architectures, (2) objective stopping rules for network pruning or growing procedures, (3) objective choice of magnitude and type of weight decay terms or additive regularizers (for penalizing large weights, etc.), (4) a measure of the effective number of well-determined parameters in a model, (5) quantified estimates of the error bars on network parameters and on network output, and (6) objective comparisons with alternative learning and interpolation models such as splines and radial basis functions. The Bayesian "evidence" automatically embodies "Occam's razor," penalizing overflexible and overcomplex models. The Bayesian approach helps detect poor underlying assumptions in learning models. For learning models well matched to a problem, a good correlation between generalization ability and the Bayesian evidence is obtained
The Anisotropic Noise in Stochastic Gradient Descent: Its Behavior of Escaping from Sharp Minima and Regularization Effects
Understanding the behavior of stochastic gradient descent (SGD) in the
context of deep neural networks has raised lots of concerns recently. Along
this line, we study a general form of gradient based optimization dynamics with
unbiased noise, which unifies SGD and standard Langevin dynamics. Through
investigating this general optimization dynamics, we analyze the behavior of
SGD on escaping from minima and its regularization effects. A novel indicator
is derived to characterize the efficiency of escaping from minima through
measuring the alignment of noise covariance and the curvature of loss function.
Based on this indicator, two conditions are established to show which type of
noise structure is superior to isotropic noise in term of escaping efficiency.
We further show that the anisotropic noise in SGD satisfies the two conditions,
and thus helps to escape from sharp and poor minima effectively, towards more
stable and flat minima that typically generalize well. We systematically design
various experiments to verify the benefits of the anisotropic noise, compared
with full gradient descent plus isotropic diffusion (i.e. Langevin dynamics).Comment: ICML 2019 camera read
Generalization of color by chickens: experimental observations and a Bayesian model
Sensory generalization influences animals' responses to novel stimuli. Because color forms a perceptual continuum, it is a good subject for studying generalization. Moreover, because different causes of variation in spectral signals, such as pigmentation, gloss, and illumination, have differing behavioral significance, it may be beneficial to have adaptable generalization. We report on generalization by poultry chicks following differential training to rewarded (T+) and unrewarded (Tā) colors, in particular on the phenomenon of peak shift, which leads to subjects preferring stimuli displaced away from Tā. The first three experiments test effects of learning either a fine or a coarse discrimination. In experiments 1 and 2, peak shift occurs, but contrary to some predictions, the shift is smaller after the animal learned a fine discrimination than after it learned a coarse discrimination. Experiment 3 finds a similar effect for generalization on a color axis orthogonal to that separating T+ from Tā. Experiment 4 shows that generalization is rapidly modified by experience. These results imply that the scale of a āperceptual rulerā is set by experience. We show that the observations are consistent with generalization following principles of Bayesian inference, which forms a powerful framework for understanding this type of behavior
- ā¦