8 research outputs found
Characterizing Rational versus Exponential Learning Curves
AbstractWe consider the standard problem of learning a concept from random examples. Here alearning curveis defined to be the expected error of a learner's hypotheses as a function of training sample size. Haussler, Littlestone, and Warmuth have shown that, in the distribution-free setting, the smallest expected error a learner can achieve in the worst case over a class of conceptsCconverges rationally to zero error; i.e.,Θ(t−1) in the training sample sizet. However, Cohn and Tesauro have recently demonstrated thatexponentialconvergence can often be observed in experimental settings (i.e., average error decreasing aseΘ−t)). By addressing a simple non-uniformity in the original analysis this paper shows how the dichotomy between rational and exponential worst case learning curves can be recovered in the distribution-free theory. In particular, our results support the experimental findings of Cohn and Tesauro: for finite concept classes any consistent learner achieves exponential convergence, even in the worst case, whereas for continuous concept classes no learner can exhibit sub-rational convergence for every target concept and domain distribution. We also draw a precise boundary between rational and exponential convergence for simple concept chains—showing that somewhere-dense chains always force rational convergence in the worst case, while exponential convergence can always be achieved for nowhere-dense chains
The Shape of Learning Curves: a Review
Learning curves provide insight into the dependence of a learner's
generalization performance on the training set size. This important tool can be
used for model selection, to predict the effect of more training data, and to
reduce the computational complexity of model training and hyperparameter
tuning. This review recounts the origins of the term, provides a formal
definition of the learning curve, and briefly covers basics such as its
estimation. Our main contribution is a comprehensive overview of the literature
regarding the shape of learning curves. We discuss empirical and theoretical
evidence that supports well-behaved curves that often have the shape of a power
law or an exponential. We consider the learning curves of Gaussian processes,
the complex shapes they can display, and the factors influencing them. We draw
specific attention to examples of learning curves that are ill-behaved, showing
worse learning performance with more training data. To wrap up, we point out
various open problems that warrant deeper empirical and theoretical
investigation. All in all, our review underscores that learning curves are
surprisingly diverse and no universal model can be identified
Online Learning in Dynamically Changing Environments
We study the problem of online learning and online regret minimization when
samples are drawn from a general unknown non-stationary process. We introduce
the concept of a dynamic changing process with cost , where the conditional
marginals of the process can vary arbitrarily, but that the number of different
conditional marginals is bounded by over rounds. For such processes we
prove a tight (upto factor) bound
for the expected worst case
regret of any finite VC-dimensional class under absolute loss
(i.e., the expected miss-classification loss). We then improve this bound for
general mixable losses, by establishing a tight (up to factor)
regret bound . We extend these
results to general smooth adversary processes with unknown reference measure by
showing a sub-linear regret bound for -dimensional threshold functions under
a general bounded convex loss. Our results can be viewed as a first step
towards regret analysis with non-stationary samples in the distribution blind
(universal) regime. This also brings a new viewpoint that shifts the study of
complexity of the hypothesis classes to the study of the complexity of
processes generating data.Comment: Submitte
Multiclass Learnability Beyond the PAC Framework: Universal Rates and Partial Concept Classes
In this paper we study the problem of multiclass classification with a
bounded number of different labels , in the realizable setting. We extend
the traditional PAC model to a) distribution-dependent learning rates, and b)
learning rates under data-dependent assumptions. First, we consider the
universal learning setting (Bousquet, Hanneke, Moran, van Handel and
Yehudayoff, STOC '21), for which we provide a complete characterization of the
achievable learning rates that holds for every fixed distribution. In
particular, we show the following trichotomy: for any concept class, the
optimal learning rate is either exponential, linear or arbitrarily slow.
Additionally, we provide complexity measures of the underlying hypothesis class
that characterize when these rates occur. Second, we consider the problem of
multiclass classification with structured data (such as data lying on a low
dimensional manifold or satisfying margin conditions), a setting which is
captured by partial concept classes (Alon, Hanneke, Holzman and Moran, FOCS
'21). Partial concepts are functions that can be undefined in certain parts of
the input space. We extend the traditional PAC learnability of total concept
classes to partial concept classes in the multiclass setting and investigate
differences between partial and total concepts
Characterizing Rational versus Exponential Learning Curves
. We consider the standard problem of learning a concept from random examples. Here a learning curve can be defined to be the expected error of a learner's hypotheses as a function of training sample size. Haussler, Littlestone and Warmuth have shown that, in the distribution free setting, the smallest expected error a learner can achieve in the worst case over a concept class C converges rationally to zero error (i.e., \Theta(1=t) for training sample size t). However, recently Cohn and Tesauro have demonstrated how exponential convergence can often be observed in experimental settings (i.e., average error decreasing as e \Theta(\Gammat) ). By addressing a simple non-uniformity in the original analysis, this paper shows how the dichotomy between rational and exponential worst case learning curves can be recovered in the distribution free theory. These results support the experimental findings of Cohn and Tesauro: for finite concept classes, any consistent learner achieves exponent..