506 research outputs found
Equations of States in Statistical Learning for a Nonparametrizable and Regular Case
Many learning machines that have hierarchical structure or hidden variables
are now being used in information science, artificial intelligence, and
bioinformatics. However, several learning machines used in such fields are not
regular but singular statistical models, hence their generalization performance
is still left unknown. To overcome these problems, in the previous papers, we
proved new equations in statistical learning, by which we can estimate the
Bayes generalization loss from the Bayes training loss and the functional
variance, on the condition that the true distribution is a singularity
contained in a learning machine. In this paper, we prove that the same
equations hold even if a true distribution is not contained in a parametric
model. Also we prove that, the proposed equations in a regular case are
asymptotically equivalent to the Takeuchi information criterion. Therefore, the
proposed equations are always applicable without any condition on the unknown
true distribution
A Widely Applicable Bayesian Information Criterion
A statistical model or a learning machine is called regular if the map taking
a parameter to a probability distribution is one-to-one and if its Fisher
information matrix is always positive definite. If otherwise, it is called
singular. In regular statistical models, the Bayes free energy, which is
defined by the minus logarithm of Bayes marginal likelihood, can be
asymptotically approximated by the Schwarz Bayes information criterion (BIC),
whereas in singular models such approximation does not hold.
Recently, it was proved that the Bayes free energy of a singular model is
asymptotically given by a generalized formula using a birational invariant, the
real log canonical threshold (RLCT), instead of half the number of parameters
in BIC. Theoretical values of RLCTs in several statistical models are now being
discovered based on algebraic geometrical methodology. However, it has been
difficult to estimate the Bayes free energy using only training samples,
because an RLCT depends on an unknown true distribution.
In the present paper, we define a widely applicable Bayesian information
criterion (WBIC) by the average log likelihood function over the posterior
distribution with the inverse temperature , where is the number
of training samples. We mathematically prove that WBIC has the same asymptotic
expansion as the Bayes free energy, even if a statistical model is singular for
and unrealizable by a statistical model. Since WBIC can be numerically
calculated without any information about a true distribution, it is a
generalized version of BIC onto singular statistical models.Comment: 30 page
Statistical Learning Theory of Quasi-Regular Cases
Many learning machines such as normal mixtures and layered neural networks
are not regular but singular statistical models, because the map from a
parameter to a probability distribution is not one-to-one. The conventional
statistical asymptotic theory can not be applied to such learning machines
because the likelihood function can not be approximated by any normal
distribution. Recently, new statistical theory has been established based on
algebraic geometry and it was clarified that the generalization and training
errors are determined by two birational invariants, the real log canonical
threshold and the singular fluctuation. However, their concrete values are left
unknown. In the present paper, we propose a new concept, a quasi-regular case
in statistical learning theory. A quasi-regular case is not a regular case but
a singular case, however, it has the same property as a regular case. In fact,
we prove that, in a quasi-regular case, two birational invariants are equal to
each other, resulting that the symmetry of the generalization and training
errors holds. Moreover, the concrete values of two birational invariants are
explicitly obtained, the quasi-regular case is useful to study statistical
learning theory
Bayesian Free Energy of Deep ReLU Neural Network in Overparametrized Cases
In many research fields in artificial intelligence, it has been shown that
deep neural networks are useful to estimate unknown functions on high
dimensional input spaces. However, their generalization performance is not yet
completely clarified from the theoretical point of view because they are
nonidentifiable and singular learning machines. Moreover, a ReLU function is
not differentiable, to which algebraic or analytic methods in singular learning
theory cannot be applied. In this paper, we study a deep ReLU neural network in
overparametrized cases and prove that the Bayesian free energy, which is equal
to the minus log marginal likelihoodor the Bayesian stochastic complexity, is
bounded even if the number of layers are larger than necessary to estimate an
unknown data-generating function. Since the Bayesian generalization error is
equal to the increase of the free energy as a function of a sample size, our
result also shows that the Bayesian generalization error does not increase even
if a deep ReLU neural network is designed to be sufficiently large or in an
opeverparametrized state.Comment: 20pages, 2figur
- …