Search CORE

5 research outputs found

Characterizing Rational versus Exponential Learning Curves

Author: Schuurmans Dale
Publication venue: Academic Press.
Publication date: 31/08/1997
Field of study

AbstractWe consider the standard problem of learning a concept from random examples. Here alearning curveis defined to be the expected error of a learner's hypotheses as a function of training sample size. Haussler, Littlestone, and Warmuth have shown that, in the distribution-free setting, the smallest expected error a learner can achieve in the worst case over a class of conceptsCconverges rationally to zero error; i.e.,Θ(t−1) in the training sample sizet. However, Cohn and Tesauro have recently demonstrated thatexponentialconvergence can often be observed in experimental settings (i.e., average error decreasing aseΘ−t)). By addressing a simple non-uniformity in the original analysis this paper shows how the dichotomy between rational and exponential worst case learning curves can be recovered in the distribution-free theory. In particular, our results support the experimental findings of Cohn and Tesauro: for finite concept classes any consistent learner achieves exponential convergence, even in the worst case, whereas for continuous concept classes no learner can exhibit sub-rational convergence for every target concept and domain distribution. We also draw a precise boundary between rational and exponential convergence for simple concept chains—showing that somewhere-dense chains always force rational convergence in the worst case, while exponential convergence can always be achieved for nowhere-dense chains

Elsevier - Publisher Connector

The Shape of Learning Curves: a Review

Author: Loog Marco
Viering Tom
Publication venue
Publication date: 19/03/2021
Field of study

Learning curves provide insight into the dependence of a learner's generalization performance on the training set size. This important tool can be used for model selection, to predict the effect of more training data, and to reduce the computational complexity of model training and hyperparameter tuning. This review recounts the origins of the term, provides a formal definition of the learning curve, and briefly covers basics such as its estimation. Our main contribution is a comprehensive overview of the literature regarding the shape of learning curves. We discuss empirical and theoretical evidence that supports well-behaved curves that often have the shape of a power law or an exponential. We consider the learning curves of Gaussian processes, the complex shapes they can display, and the factors influencing them. We draw specific attention to examples of learning curves that are ill-behaved, showing worse learning performance with more training data. To wrap up, we point out various open problems that warrant deeper empirical and theoretical investigation. All in all, our review underscores that learning curves are surprisingly diverse and no universal model can be identified

arXiv.org e-Print Archive

UNIVERSALITY OF SCALING: PERSPECTIVES IN ARTIFICIAL INTELLIGENCE AND PHYSICS

Author: Sharma Utkarsh
Publication venue: 'The Busan Gyeongnam Mathematical Society'
Publication date: 21/02/2022
Field of study

The presence of universal phenomena both hints towards deep underlying principles and can also serve as a tool to uncover them. Often, the scaling behavior of systems shows such universality. An example of this is artificial neural networks (ANNs), which are ubiquitously employed in artificial intelligence (AI) technology today. The performance of an ANN, measured by the loss

L

, scales with the size of the network

N

and with the quantity of training data

D

as simple power laws in

N

D

. We explain these laws theoretically. Additionally, our theory also explains the persistence over many orders of magnitude of the scaling with model size. When both the amount of data

D

and the model size

N

are finite, the loss scales as

L\propto D^{-1}

and

L\propto N^{-1/2}

. The scaling in the regime where either

N

D

is effectively infinite is more non-trivial, being tied to the intrinsic dimension

d

of the training dataset by the simple relations

L\propto N^{-4/d}

and

L\propto D^{-4/d}

. We test our theoretical predictions in a teacher/student framework, and on several datasets and with GPT-type language models. These measurements yield intrinsic dimensions for several image datasets and set bounds on the dimension of the English language that these were trained on. Scaling behaviors also act as a tool to probe fundamental phenomena in nature---in this case the theory of quantum gravity. We use holography to probe spacetime by using the physics on its boundary. Specifically, previous work has employed the scaling properties of operators on the boundary to construct a scalar field in the bulk. Our construction extends this procedure to allow for arbitrary choice of gravitational dressing of the field. Apart from yielding a more comprehensive understanding of the quantum properties of gravity, our construction is suitable to test the non-locality of quantum gravity

JScholarship