Search CORE

8 research outputs found

Characterizing Rational versus Exponential Learning Curves

Author: Schuurmans Dale
Publication venue: Academic Press.
Publication date: 31/08/1997
Field of study

AbstractWe consider the standard problem of learning a concept from random examples. Here alearning curveis defined to be the expected error of a learner's hypotheses as a function of training sample size. Haussler, Littlestone, and Warmuth have shown that, in the distribution-free setting, the smallest expected error a learner can achieve in the worst case over a class of conceptsCconverges rationally to zero error; i.e.,Θ(t−1) in the training sample sizet. However, Cohn and Tesauro have recently demonstrated thatexponentialconvergence can often be observed in experimental settings (i.e., average error decreasing aseΘ−t)). By addressing a simple non-uniformity in the original analysis this paper shows how the dichotomy between rational and exponential worst case learning curves can be recovered in the distribution-free theory. In particular, our results support the experimental findings of Cohn and Tesauro: for finite concept classes any consistent learner achieves exponential convergence, even in the worst case, whereas for continuous concept classes no learner can exhibit sub-rational convergence for every target concept and domain distribution. We also draw a precise boundary between rational and exponential convergence for simple concept chains—showing that somewhere-dense chains always force rational convergence in the worst case, while exponential convergence can always be achieved for nowhere-dense chains

Elsevier - Publisher Connector

The Shape of Learning Curves: a Review

Author: Loog Marco
Viering Tom
Publication venue
Publication date: 19/03/2021
Field of study

Learning curves provide insight into the dependence of a learner's generalization performance on the training set size. This important tool can be used for model selection, to predict the effect of more training data, and to reduce the computational complexity of model training and hyperparameter tuning. This review recounts the origins of the term, provides a formal definition of the learning curve, and briefly covers basics such as its estimation. Our main contribution is a comprehensive overview of the literature regarding the shape of learning curves. We discuss empirical and theoretical evidence that supports well-behaved curves that often have the shape of a power law or an exponential. We consider the learning curves of Gaussian processes, the complex shapes they can display, and the factors influencing them. We draw specific attention to examples of learning curves that are ill-behaved, showing worse learning performance with more training data. To wrap up, we point out various open problems that warrant deeper empirical and theoretical investigation. All in all, our review underscores that learning curves are surprisingly diverse and no universal model can be identified

arXiv.org e-Print Archive

Online Learning in Dynamically Changing Environments

Author: Grama Ananth
Szpankowski Wojciech
Wu Changlong
Publication venue
Publication date: 31/01/2023
Field of study

We study the problem of online learning and online regret minimization when samples are drawn from a general unknown non-stationary process. We introduce the concept of a dynamic changing process with cost

K

, where the conditional marginals of the process can vary arbitrarily, but that the number of different conditional marginals is bounded by

K

over

T

rounds. For such processes we prove a tight (upto

\sqrt{\log T}

factor) bound

O(\sqrt{KT\cdot\mathsf{VC}(\mathcal{H})\log T})

for the expected worst case regret of any finite VC-dimensional class

\mathcal{H}

under absolute loss (i.e., the expected miss-classification loss). We then improve this bound for general mixable losses, by establishing a tight (up to

\log^3 T

factor) regret bound

O(K\cdot\mathsf{VC}(\mathcal{H})\log^3 T)

. We extend these results to general smooth adversary processes with unknown reference measure by showing a sub-linear regret bound for

1

-dimensional threshold functions under a general bounded convex loss. Our results can be viewed as a first step towards regret analysis with non-stationary samples in the distribution blind (universal) regime. This also brings a new viewpoint that shifts the study of complexity of the hypothesis classes to the study of the complexity of processes generating data.Comment: Submitte

arXiv.org e-Print Archive

Multiclass Learnability Beyond the PAC Framework: Universal Rates and Partial Concept Classes

Author: Kalavasis Alkis
Karbasi Amin
Velegkas Grigoris
Publication venue
Publication date: 07/10/2022
Field of study

In this paper we study the problem of multiclass classification with a bounded number of different labels

k

, in the realizable setting. We extend the traditional PAC model to a) distribution-dependent learning rates, and b) learning rates under data-dependent assumptions. First, we consider the universal learning setting (Bousquet, Hanneke, Moran, van Handel and Yehudayoff, STOC '21), for which we provide a complete characterization of the achievable learning rates that holds for every fixed distribution. In particular, we show the following trichotomy: for any concept class, the optimal learning rate is either exponential, linear or arbitrarily slow. Additionally, we provide complexity measures of the underlying hypothesis class that characterize when these rates occur. Second, we consider the problem of multiclass classification with structured data (such as data lying on a low dimensional manifold or satisfying margin conditions), a setting which is captured by partial concept classes (Alon, Hanneke, Holzman and Moran, FOCS '21). Partial concepts are functions that can be undefined in certain parts of the input space. We extend the traditional PAC learnability of total concept classes to partial concept classes in the multiclass setting and investigate differences between partial and total concepts

arXiv.org e-Print Archive

Characterizing Rational versus Exponential Learning Curves

Author: Dale Schuurmans
Publication venue: Springer Verlag
Publication date: 01/01/1995
Field of study

. We consider the standard problem of learning a concept from random examples. Here a learning curve can be defined to be the expected error of a learner's hypotheses as a function of training sample size. Haussler, Littlestone and Warmuth have shown that, in the distribution free setting, the smallest expected error a learner can achieve in the worst case over a concept class C converges rationally to zero error (i.e., \Theta(1=t) for training sample size t). However, recently Cohn and Tesauro have demonstrated how exponential convergence can often be observed in experimental settings (i.e., average error decreasing as e \Theta(\Gammat) ). By addressing a simple non-uniformity in the original analysis, this paper shows how the dichotomy between rational and exponential worst case learning curves can be recovered in the distribution free theory. These results support the experimental findings of Cohn and Tesauro: for finite concept classes, any consistent learner achieves exponent..

CiteSeerX

Characterizing Rational versus Exponential Learning Curves

Author: Amari
Barnard
Baum
Cohn
Dale Schuurmans
Dudley
Györgyi
Haussler
Larsen
Lyuu
Opper
Rosenstein
Schwartz
Valiant
Vapnik
Vapnik
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref

Characterizing rational versus exponential learning curves

Author: D. B. Schwartz
D. Cohn
D. Cohn
D. Haussler
E. B. Baum
J. G. Rosenstein
L. G. Valiant
M. Opper
R. B. Ash
R. Brualdi
R. J. Larsen
S. Amari
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref