5 research outputs found
Characterizing Rational versus Exponential Learning Curves
AbstractWe consider the standard problem of learning a concept from random examples. Here alearning curveis defined to be the expected error of a learner's hypotheses as a function of training sample size. Haussler, Littlestone, and Warmuth have shown that, in the distribution-free setting, the smallest expected error a learner can achieve in the worst case over a class of conceptsCconverges rationally to zero error; i.e.,Θ(t−1) in the training sample sizet. However, Cohn and Tesauro have recently demonstrated thatexponentialconvergence can often be observed in experimental settings (i.e., average error decreasing aseΘ−t)). By addressing a simple non-uniformity in the original analysis this paper shows how the dichotomy between rational and exponential worst case learning curves can be recovered in the distribution-free theory. In particular, our results support the experimental findings of Cohn and Tesauro: for finite concept classes any consistent learner achieves exponential convergence, even in the worst case, whereas for continuous concept classes no learner can exhibit sub-rational convergence for every target concept and domain distribution. We also draw a precise boundary between rational and exponential convergence for simple concept chains—showing that somewhere-dense chains always force rational convergence in the worst case, while exponential convergence can always be achieved for nowhere-dense chains
The Shape of Learning Curves: a Review
Learning curves provide insight into the dependence of a learner's
generalization performance on the training set size. This important tool can be
used for model selection, to predict the effect of more training data, and to
reduce the computational complexity of model training and hyperparameter
tuning. This review recounts the origins of the term, provides a formal
definition of the learning curve, and briefly covers basics such as its
estimation. Our main contribution is a comprehensive overview of the literature
regarding the shape of learning curves. We discuss empirical and theoretical
evidence that supports well-behaved curves that often have the shape of a power
law or an exponential. We consider the learning curves of Gaussian processes,
the complex shapes they can display, and the factors influencing them. We draw
specific attention to examples of learning curves that are ill-behaved, showing
worse learning performance with more training data. To wrap up, we point out
various open problems that warrant deeper empirical and theoretical
investigation. All in all, our review underscores that learning curves are
surprisingly diverse and no universal model can be identified
UNIVERSALITY OF SCALING: PERSPECTIVES IN ARTIFICIAL INTELLIGENCE AND PHYSICS
The presence of universal phenomena both hints towards deep underlying principles and can also serve as a tool to uncover them. Often, the scaling behavior of systems shows such universality. An example of this is artificial neural networks (ANNs), which are ubiquitously employed in artificial intelligence (AI) technology today. The performance of an ANN, measured by the loss , scales with the size of the network and with the quantity of training data as simple power laws in or . We explain these laws theoretically. Additionally, our theory also explains the persistence over many orders of magnitude of the scaling with model size.
When both the amount of data and the model size are finite, the loss scales as and . The scaling in the regime where either or is effectively infinite is more non-trivial, being tied to the intrinsic dimension of the training dataset by the simple relations and . We test our theoretical predictions in a teacher/student framework, and on several datasets and with GPT-type language models. These measurements yield intrinsic dimensions for several image datasets and set bounds on the dimension of the English language that these were trained on.
Scaling behaviors also act as a tool to probe fundamental phenomena in nature---in this case the theory of quantum gravity. We use holography to probe spacetime by using the physics on its boundary. Specifically, previous work has employed the scaling properties of operators on the boundary to construct a scalar field in the bulk. Our construction extends this procedure to allow for arbitrary choice of gravitational dressing of the field. Apart from yielding a more comprehensive understanding of the quantum properties of gravity, our construction is suitable to test the non-locality of quantum gravity