5 research outputs found
Asymptotic Learning Curve and Renormalizable Condition in Statistical Learning Theory
Bayes statistics and statistical physics have the common mathematical
structure, where the log likelihood function corresponds to the random
Hamiltonian. Recently, it was discovered that the asymptotic learning curves in
Bayes estimation are subject to a universal law, even if the log likelihood
function can not be approximated by any quadratic form. However, it is left
unknown what mathematical property ensures such a universal law. In this paper,
we define a renormalizable condition of the statistical estimation problem, and
show that, under such a condition, the asymptotic learning curves are ensured
to be subject to the universal law, even if the true distribution is
unrealizable and singular for a statistical model. Also we study a
nonrenormalizable case, in which the learning curves have the different
asymptotic behaviors from the universal law
A Widely Applicable Bayesian Information Criterion
A statistical model or a learning machine is called regular if the map taking
a parameter to a probability distribution is one-to-one and if its Fisher
information matrix is always positive definite. If otherwise, it is called
singular. In regular statistical models, the Bayes free energy, which is
defined by the minus logarithm of Bayes marginal likelihood, can be
asymptotically approximated by the Schwarz Bayes information criterion (BIC),
whereas in singular models such approximation does not hold.
Recently, it was proved that the Bayes free energy of a singular model is
asymptotically given by a generalized formula using a birational invariant, the
real log canonical threshold (RLCT), instead of half the number of parameters
in BIC. Theoretical values of RLCTs in several statistical models are now being
discovered based on algebraic geometrical methodology. However, it has been
difficult to estimate the Bayes free energy using only training samples,
because an RLCT depends on an unknown true distribution.
In the present paper, we define a widely applicable Bayesian information
criterion (WBIC) by the average log likelihood function over the posterior
distribution with the inverse temperature , where is the number
of training samples. We mathematically prove that WBIC has the same asymptotic
expansion as the Bayes free energy, even if a statistical model is singular for
and unrealizable by a statistical model. Since WBIC can be numerically
calculated without any information about a true distribution, it is a
generalized version of BIC onto singular statistical models.Comment: 30 page