In statistical learning theory, determining the sample complexity of
realizable binary classification for VC classes was a long-standing open
problem. The results of Simon and Hanneke established sharp upper bounds in
this setting. However, the reliance of their argument on the uniform
convergence principle limits its applicability to more general learning
settings such as multiclass classification. In this paper, we address this
issue by providing optimal high probability risk bounds through a framework
that surpasses the limitations of uniform convergence arguments.
Our framework converts the leave-one-out error of permutation invariant
predictors into high probability risk bounds. As an application, by adapting
the one-inclusion graph algorithm of Haussler, Littlestone, and Warmuth, we
propose an algorithm that achieves an optimal PAC bound for binary
classification. Specifically, our result shows that certain aggregations of
one-inclusion graph algorithms are optimal, addressing a variant of a classic
question posed by Warmuth.
We further instantiate our framework in three settings where uniform
convergence is provably suboptimal. For multiclass classification, we prove an
optimal risk bound that scales with the one-inclusion hypergraph density of the
class, addressing the suboptimality of the analysis of Daniely and
Shalev-Shwartz. For partial hypothesis classification, we determine the optimal
sample complexity bound, resolving a question posed by Alon, Hanneke, Holzman,
and Moran. For realizable bounded regression with absolute loss, we derive an
optimal risk bound that relies on a modified version of the scale-sensitive
dimension, refining the results of Bartlett and Long. Our rates surpass
standard uniform convergence-based results due to the smaller complexity
measure in our risk bound.Comment: 27 page