12 research outputs found
Optimality of Universal Bayesian Sequence Prediction for General Loss and Alphabet
Various optimality properties of universal sequence predictors based on
Bayes-mixtures in general, and Solomonoff's prediction scheme in particular,
will be studied. The probability of observing at time , given past
observations can be computed with the chain rule if the true
generating distribution of the sequences is known. If
is unknown, but known to belong to a countable or continuous class \M
one can base ones prediction on the Bayes-mixture defined as a
-weighted sum or integral of distributions \nu\in\M. The cumulative
expected loss of the Bayes-optimal universal prediction scheme based on
is shown to be close to the loss of the Bayes-optimal, but infeasible
prediction scheme based on . We show that the bounds are tight and that no
other predictor can lead to significantly smaller bounds. Furthermore, for
various performance measures, we show Pareto-optimality of and give an
Occam's razor argument that the choice for the weights
is optimal, where is the length of the shortest program describing
. The results are applied to games of chance, defined as a sequence of
bets, observations, and rewards. The prediction schemes (and bounds) are
compared to the popular predictors based on expert advice. Extensions to
infinite alphabets, partial, delayed and probabilistic prediction,
classification, and more active systems are briefly discussed.Comment: 34 page
A Tight Excess Risk Bound via a Unified PAC-Bayesian-Rademacher-Shtarkov-MDL Complexity
We present a novel notion of complexity that interpolates between and
generalizes some classic existing complexity notions in learning theory: for
estimators like empirical risk minimization (ERM) with arbitrary bounded
losses, it is upper bounded in terms of data-independent Rademacher complexity;
for generalized Bayesian estimators, it is upper bounded by the data-dependent
information complexity (also known as stochastic or PAC-Bayesian,
complexity. For
(penalized) ERM, the new complexity reduces to (generalized) normalized maximum
likelihood (NML) complexity, i.e. a minimax log-loss individual-sequence
regret. Our first main result bounds excess risk in terms of the new
complexity. Our second main result links the new complexity via Rademacher
complexity to entropy, thereby generalizing earlier results of Opper,
Haussler, Lugosi, and Cesa-Bianchi who did the log-loss case with .
Together, these results recover optimal bounds for VC- and large (polynomial
entropy) classes, replacing localized Rademacher complexity by a simpler
analysis which almost completely separates the two aspects that determine the
achievable rates: 'easiness' (Bernstein) conditions and model complexity.Comment: 38 page
Predicting a binary sequence almost as well as the optimal biased coin
AbstractWe apply the exponential weight algorithm, introduced and Littlestone and Warmuth [26] and by Vovk [35] to the problem of predicting a binary sequence almost as well as the best biased coin. We first show that for the case of the logarithmic loss, the derived algorithm is equivalent to the Bayes algorithm with Jeffrey’s prior, that was studied by Xie and Barron [38] under probabilistic assumptions. We derive a uniform bound on the regret which holds for any sequence. We also show that if the empirical distribution of the sequence is bounded away from 0 and from 1, then, as the length of the sequence increases to infinity, the difference between this bound and a corresponding bound on the average case regret of the same algorithm (which is asymptotically optimal in that case) is only 1/2. We show that this gap of 1/2 is necessary by calculating the regret of the min–max optimal algorithm for this problem and showing that the asymptotic upper bound is tight. We also study the application of this algorithm to the square loss and show that the algorithm that is derived in this case is different from the Bayes algorithm and is better than it for prediction in the worst-case
A tight excess risk bound via a unified PAC-Bayesian-Rademacher-Shtarkov-MDL complexity
We present a novel notion of complexity that interpolates between and generalizes some classic existing complexity notions in learning theory: for estimators like empirical risk minimization (ERM) with arbitrary bounded losses, it is upper bounded in terms of data-independent Rademacher complexity; for generalized Bayesian estimators, it is upper bounded by the data-dependent information complexity (also known as stochastic or PAC-Bayesian, KL(posterior∥prior) complexity. For (penalized) ERM, the new complexity reduces to (generalized) normalized maximum likelihood (NML) complexity, i.e. a minimax log-loss individual-sequence regret. Our first main result bounds excess risk in terms of the new complexity. Our second main result links the new complexity via Rademacher complexity to L2(P) entropy, thereby generalizing earlier results of Opper, Haussler, Lugosi, and Cesa-Bianchi who did the log-loss case with L∞. Together, these results recover optimal bounds for VC- and large (polynomial entropy) classes, replacing localized Rademacher complexity by a simpler analysis which almost completely separates the two aspects that determine the achievable rates: 'easiness' (Bernstein) conditions and model complexity
A Tight Excess Risk Bound via a Unified PAC-Bayesian-Rademacher-Shtarkov-MDL Complexity
We present a novel notion of complexity that interpolates between and generalizes some classic existing complexity notions in learning theory: for estimators like empirical risk minimization (ERM) with arbitrary bounded losses, it is upper bounded in terms of data-independent Rademacher complexity; for generalized Bayesian estimators, it is upper bounded by the data-dependent information complexity (also known as stochastic or PAC-Bayesian, KL(posterior∥prior) complexity. For (penalized) ERM, the new complexity reduces to (generalized) normalized maximum likelihood (NML) complexity, i.e. a minimax log-loss individual-sequence regret. Our first main result bounds excess risk in terms of the new complexity. Our second main result links the new complexity via Rademacher complexity to L2(P) entropy, thereby generalizing earlier results of Opper, Haussler, Lugosi, and Cesa-Bianchi who did the log-loss case with L∞. Together, these results recover optimal bounds for VC- and large (polynomial entropy) classes, replacing localized Rademacher complexity by a simpler analysis which almost completely separates the two aspects that determine the achievable rates: 'easiness' (Bernstein) conditions and model complexity