713 research outputs found
On PAC-Bayesian Bounds for Random Forests
Existing guarantees in terms of rigorous upper bounds on the generalization
error for the original random forest algorithm, one of the most frequently used
machine learning methods, are unsatisfying. We discuss and evaluate various
PAC-Bayesian approaches to derive such bounds. The bounds do not require
additional hold-out data, because the out-of-bag samples from the bagging in
the training process can be exploited. A random forest predicts by taking a
majority vote of an ensemble of decision trees. The first approach is to bound
the error of the vote by twice the error of the corresponding Gibbs classifier
(classifying with a single member of the ensemble selected at random). However,
this approach does not take into account the effect of averaging out of errors
of individual classifiers when taking the majority vote. This effect provides a
significant boost in performance when the errors are independent or negatively
correlated, but when the correlations are strong the advantage from taking the
majority vote is small. The second approach based on PAC-Bayesian C-bounds
takes dependencies between ensemble members into account, but it requires
estimating correlations between the errors of the individual classifiers. When
the correlations are high or the estimation is poor, the bounds degrade. In our
experiments, we compute generalization bounds for random forests on various
benchmark data sets. Because the individual decision trees already perform
well, their predictions are highly correlated and the C-bounds do not lead to
satisfactory results. For the same reason, the bounds based on the analysis of
Gibbs classifiers are typically superior and often reasonably tight. Bounds
based on a validation set coming at the cost of a smaller training set gave
better performance guarantees, but worse performance in most experiments
Second Order PAC-Bayesian Bounds for the Weighted Majority Vote
We present a novel analysis of the expected risk of weighted majority vote in
multiclass classification. The analysis takes correlation of predictions by
ensemble members into account and provides a bound that is amenable to
efficient minimization, which yields improved weighting for the majority vote.
We also provide a specialized version of our bound for binary classification,
which allows to exploit additional unlabeled data for tighter risk estimation.
In experiments, we apply the bound to improve weighting of trees in random
forests and show that, in contrast to the commonly used first order bound,
minimization of the new bound typically does not lead to degradation of the
test error of the ensemble
Learning Stochastic Majority Votes by Minimizing a PAC-Bayes Generalization Bound
We investigate a stochastic counterpart of majority votes over finite ensembles of classifiers, and study its generalization properties. While our approach holds for arbitrary distributions, we instantiate it with Dirichlet distributions: this allows for a closed-form and differentiable expression for the expected risk, which then turns the generalization bound into a tractable training objective.The resulting stochastic majority vote learning algorithm achieves state-of-the-art accuracy and benefits from (non-vacuous) tight generalization bounds, in a series of numerical experiments when compared to competing algorithms which also minimize PAC-Bayes objectives -- both with uninformed (data-independent) and informed (data-dependent) priors
Learning Capacity: A Measure of the Effective Dimensionality of a Model
We exploit a formal correspondence between thermodynamics and inference,
where the number of samples can be thought of as the inverse temperature, to
define a "learning capacity'' which is a measure of the effective
dimensionality of a model. We show that the learning capacity is a tiny
fraction of the number of parameters for many deep networks trained on typical
datasets, depends upon the number of samples used for training, and is
numerically consistent with notions of capacity obtained from the PAC-Bayesian
framework. The test error as a function of the learning capacity does not
exhibit double descent. We show that the learning capacity of a model saturates
at very small and very large sample sizes; this provides guidelines, as to
whether one should procure more data or whether one should search for new
architectures, to improve performance. We show how the learning capacity can be
used to understand the effective dimensionality, even for non-parametric models
such as random forests and -nearest neighbor classifiers
On the Generalization of the C-Bound to Structured Output Ensemble Methods
This paper generalizes an important result from the PAC-Bayesian literature for binary classification to the case of ensemble methods for structured outputs. We prove a generic version of the \Cbound, an upper bound over the risk of models expressed as a weighted majority vote that is based on the first and second statistical moments of the vote's margin. This bound may advantageously be applied on more complex outputs such as multiclass labels and multilabel, and allow to consider margin relaxations. These results open the way to develop new ensemble methods for structured output prediction with PAC-Bayesian guarantees
A Parsimonious Tour of Bayesian Model Uncertainty
Modern statistical software and machine learning libraries are enabling
semi-automated statistical inference. Within this context, it appears easier
and easier to try and fit many models to the data at hand, reversing thereby
the Fisherian way of conducting science by collecting data after the scientific
hypothesis (and hence the model) has been determined. The renewed goal of the
statistician becomes to help the practitioner choose within such large and
heterogeneous families of models, a task known as model selection. The Bayesian
paradigm offers a systematized way of assessing this problem. This approach,
launched by Harold Jeffreys in his 1935 book Theory of Probability, has
witnessed a remarkable evolution in the last decades, that has brought about
several new theoretical and methodological advances. Some of these recent
developments are the focus of this survey, which tries to present a unifying
perspective on work carried out by different communities. In particular, we
focus on non-asymptotic out-of-sample performance of Bayesian model selection
and averaging techniques, and draw connections with penalized maximum
likelihood. We also describe recent extensions to wider classes of
probabilistic frameworks including high-dimensional, unidentifiable, or
likelihood-free models
Conformal Prediction: a Unified Review of Theory and New Challenges
In this work we provide a review of basic ideas and novel developments about
Conformal Prediction -- an innovative distribution-free, non-parametric
forecasting method, based on minimal assumptions -- that is able to yield in a
very straightforward way predictions sets that are valid in a statistical sense
also in in the finite sample case. The in-depth discussion provided in the
paper covers the theoretical underpinnings of Conformal Prediction, and then
proceeds to list the more advanced developments and adaptations of the original
idea.Comment: arXiv admin note: text overlap with arXiv:0706.3188,
arXiv:1604.04173, arXiv:1709.06233, arXiv:1203.5422 by other author
- …