Bayesian Classifiers are Large Margin Hyperplanes in a Hilbert Space

Abstract

Bayesian algorithms for Neural Networks are known to produce classifiers which are very resistent to overfitting. It is often claimed that one of the main distinctive features of Bayesian Learning Algorithms is that they don't simply output one hypothesis, but rather an entire distribution of probability over an hypothesis set: the Bayes posterior. An alternative perspective is that they output a linear combination of classifiers, whose coefficients are given by Bayes theorem. One of the concepts used to deal with thresholded convex combinations is the `margin' of the hyperplane with respect to the training sample, which is correlated to the predictive power of the hypothesis itself. We provide a novel theoretical analysis of such classifiers, based on Data-Dependent VC theory, proving that they can be expected to be large margin hyperplanes in a Hilbert space. We then present experimental evidence that the predictions of our model are correct, i.e. that bayesian classifers really find..

    Similar works

    Full text

    thumbnail-image

    Available Versions