We present a novel notion of complexity that interpolates between and
generalizes some classic existing complexity notions in learning theory: for
estimators like empirical risk minimization (ERM) with arbitrary bounded
losses, it is upper bounded in terms of data-independent Rademacher complexity;
for generalized Bayesian estimators, it is upper bounded by the data-dependent
information complexity (also known as stochastic or PAC-Bayesian,
KL(posterior∥prior) complexity. For
(penalized) ERM, the new complexity reduces to (generalized) normalized maximum
likelihood (NML) complexity, i.e. a minimax log-loss individual-sequence
regret. Our first main result bounds excess risk in terms of the new
complexity. Our second main result links the new complexity via Rademacher
complexity to L2(P) entropy, thereby generalizing earlier results of Opper,
Haussler, Lugosi, and Cesa-Bianchi who did the log-loss case with L∞.
Together, these results recover optimal bounds for VC- and large (polynomial
entropy) classes, replacing localized Rademacher complexity by a simpler
analysis which almost completely separates the two aspects that determine the
achievable rates: 'easiness' (Bernstein) conditions and model complexity.Comment: 38 page