502 research outputs found
Concentration inequalities of the cross-validation estimate for stable predictors
In this article, we derive concentration inequalities for the
cross-validation estimate of the generalization error for stable predictors in
the context of risk assessment. The notion of stability has been first
introduced by \cite{DEWA79} and extended by \cite{KEA95}, \cite{BE01} and
\cite{KUNIY02} to characterize class of predictors with infinite VC dimension.
In particular, this covers -nearest neighbors rules, bayesian algorithm
(\cite{KEA95}), boosting,... General loss functions and class of predictors are
considered. We use the formalism introduced by \cite{DUD03} to cover a large
variety of cross-validation procedures including leave-one-out
cross-validation, -fold cross-validation, hold-out cross-validation (or
split sample), and the leave--out cross-validation.
In particular, we give a simple rule on how to choose the cross-validation,
depending on the stability of the class of predictors. In the special case of
uniform stability, an interesting consequence is that the number of elements in
the test set is not required to grow to infinity for the consistency of the
cross-validation procedure. In this special case, the particular interest of
leave-one-out cross-validation is emphasized
Hypothesis Transfer Learning with Surrogate Classification Losses: Generalization Bounds through Algorithmic Stability
Hypothesis transfer learning (HTL) contrasts domain adaptation by allowing
for a previous task leverage, named the source, into a new one, the target,
without requiring access to the source data. Indeed, HTL relies only on a
hypothesis learnt from such source data, relieving the hurdle of expansive data
storage and providing great practical benefits. Hence, HTL is highly beneficial
for real-world applications relying on big data. The analysis of such a method
from a theoretical perspective faces multiple challenges, particularly in
classification tasks. This paper deals with this problem by studying the
learning theory of HTL through algorithmic stability, an attractive theoretical
framework for machine learning algorithms analysis. In particular, we are
interested in the statistical behaviour of the regularized empirical risk
minimizers in the case of binary classification. Our stability analysis
provides learning guarantees under mild assumptions. Consequently, we derive
several complexity-free generalization bounds for essential statistical
quantities like the training error, the excess risk and cross-validation
estimates. These refined bounds allow understanding the benefits of transfer
learning and comparing the behaviour of standard losses in different scenarios,
leading to valuable insights for practitioners
- …