2,065 research outputs found
A Neural Network model with Bidirectional Whitening
We present here a new model and algorithm which performs an efficient Natural
gradient descent for Multilayer Perceptrons. Natural gradient descent was
originally proposed from a point of view of information geometry, and it
performs the steepest descent updates on manifolds in a Riemannian space. In
particular, we extend an approach taken by the "Whitened neural networks"
model. We make the whitening process not only in feed-forward direction as in
the original model, but also in the back-propagation phase. Its efficacy is
shown by an application of this "Bidirectional whitened neural networks" model
to a handwritten character recognition data (MNIST data).Comment: 16page
On-Line Learning Theory of Soft Committee Machines with Correlated Hidden Units - Steepest Gradient Descent and Natural Gradient Descent -
The permutation symmetry of the hidden units in multilayer perceptrons causes
the saddle structure and plateaus of the learning dynamics in gradient learning
methods. The correlation of the weight vectors of hidden units in a teacher
network is thought to affect this saddle structure, resulting in a prolonged
learning time, but this mechanism is still unclear. In this paper, we discuss
it with regard to soft committee machines and on-line learning using
statistical mechanics. Conventional gradient descent needs more time to break
the symmetry as the correlation of the teacher weight vectors rises. On the
other hand, no plateaus occur with natural gradient descent regardless of the
correlation for the limit of a low learning rate. Analytical results support
these dynamics around the saddle point.Comment: 7 pages, 6 figure
State Concentration Exponent as a Measure of Quickness in Kauffman-type Networks
We study the dynamics of randomly connected networks composed of binary
Boolean elements and those composed of binary majority vote elements. We
elucidate their differences in both sparsely and densely connected cases. The
quickness of large network dynamics is usually quantified by the length of
transient paths, an analytically intractable measure. For discrete-time
dynamics of networks of binary elements, we address this dilemma with an
alternative unified framework by using a concept termed state concentration,
defined as the exponent of the average number of t-step ancestors in state
transition graphs. The state transition graph is defined by nodes corresponding
to network states and directed links corresponding to transitions. Using this
exponent, we interrogate the dynamics of random Boolean and majority vote
networks. We find that extremely sparse Boolean networks and majority vote
networks with arbitrary density achieve quickness, owing in part to long-tailed
in-degree distributions. As a corollary, only relatively dense majority vote
networks can achieve both quickness and robustness.Comment: 6 figure
Laplace's rule of succession in information geometry
Laplace's "add-one" rule of succession modifies the observed frequencies in a
sequence of heads and tails by adding one to the observed counts. This improves
prediction by avoiding zero probabilities and corresponds to a uniform Bayesian
prior on the parameter. The canonical Jeffreys prior corresponds to the
"add-one-half" rule. We prove that, for exponential families of distributions,
such Bayesian predictors can be approximated by taking the average of the
maximum likelihood predictor and the \emph{sequential normalized maximum
likelihood} predictor from information theory. Thus in this case it is possible
to approximate Bayesian predictors without the cost of integrating or sampling
in parameter space
Fluctuation Theorems on Nishimori Line
The distribution of the performed work for spin glasses with gauge symmetry
is considered. With the aid of the gauge symmetry, which leads to the
exact/rigorous results in spin glasses, we find a fascinating relation of the
performed work as the fluctuation theorem. The integral form of the resultant
relation reproduces the Jarzynski-type equation for spin glasses we have
obtained. We show that similar relations can be established not only for the
distribution of the performed work but also that of the free energy of spin
glasses with gauge symmetry, which provides another interpretation of the phase
transition in spin glasses.Comment: 10 pages, and 1 figur
Parametric Fokker-Planck equation
We derive the Fokker-Planck equation on the parametric space. It is the
Wasserstein gradient flow of relative entropy on the statistical manifold. We
pull back the PDE to a finite dimensional ODE on parameter space. Some
analytical example and numerical examples are presented
Information Geometry, Inference Methods and Chaotic Energy Levels Statistics
In this Letter, we propose a novel information-geometric characterization of
chaotic (integrable) energy level statistics of a quantum antiferromagnetic
Ising spin chain in a tilted (transverse) external magnetic field. Finally, we
conjecture our results might find some potential physical applications in
quantum energy level statistics.Comment: 9 pages, added correct journal referenc
Pushing Stochastic Gradient towards Second-Order Methods -- Backpropagation Learning with Transformations in Nonlinearities
Recently, we proposed to transform the outputs of each hidden neuron in a
multi-layer perceptron network to have zero output and zero slope on average,
and use separate shortcut connections to model the linear dependencies instead.
We continue the work by firstly introducing a third transformation to normalize
the scale of the outputs of each hidden neuron, and secondly by analyzing the
connections to second order optimization methods. We show that the
transformations make a simple stochastic gradient behave closer to second-order
optimization methods and thus speed up learning. This is shown both in theory
and with experiments. The experiments on the third transformation show that
while it further increases the speed of learning, it can also hurt performance
by converging to a worse local optimum, where both the inputs and outputs of
many hidden neurons are close to zero.Comment: 10 pages, 5 figures, ICLR201
Bifurcation analysis in an associative memory model
We previously reported the chaos induced by the frustration of interaction in
a non-monotonic sequential associative memory model, and showed the chaotic
behaviors at absolute zero. We have now analyzed bifurcation in a stochastic
system, namely a finite-temperature model of the non-monotonic sequential
associative memory model. We derived order-parameter equations from the
stochastic microscopic equations. Two-parameter bifurcation diagrams obtained
from those equations show the coexistence of attractors, which do not appear at
absolute zero, and the disappearance of chaos due to the temperature effect.Comment: 19 page
- …