5 research outputs found

    Characterizing Rational versus Exponential Learning Curves

    Get PDF
    AbstractWe consider the standard problem of learning a concept from random examples. Here alearning curveis defined to be the expected error of a learner's hypotheses as a function of training sample size. Haussler, Littlestone, and Warmuth have shown that, in the distribution-free setting, the smallest expected error a learner can achieve in the worst case over a class of conceptsCconverges rationally to zero error; i.e.,Θ(t−1) in the training sample sizet. However, Cohn and Tesauro have recently demonstrated thatexponentialconvergence can often be observed in experimental settings (i.e., average error decreasing aseΘ−t)). By addressing a simple non-uniformity in the original analysis this paper shows how the dichotomy between rational and exponential worst case learning curves can be recovered in the distribution-free theory. In particular, our results support the experimental findings of Cohn and Tesauro: for finite concept classes any consistent learner achieves exponential convergence, even in the worst case, whereas for continuous concept classes no learner can exhibit sub-rational convergence for every target concept and domain distribution. We also draw a precise boundary between rational and exponential convergence for simple concept chains—showing that somewhere-dense chains always force rational convergence in the worst case, while exponential convergence can always be achieved for nowhere-dense chains

    The Shape of Learning Curves: a Review

    Full text link
    Learning curves provide insight into the dependence of a learner's generalization performance on the training set size. This important tool can be used for model selection, to predict the effect of more training data, and to reduce the computational complexity of model training and hyperparameter tuning. This review recounts the origins of the term, provides a formal definition of the learning curve, and briefly covers basics such as its estimation. Our main contribution is a comprehensive overview of the literature regarding the shape of learning curves. We discuss empirical and theoretical evidence that supports well-behaved curves that often have the shape of a power law or an exponential. We consider the learning curves of Gaussian processes, the complex shapes they can display, and the factors influencing them. We draw specific attention to examples of learning curves that are ill-behaved, showing worse learning performance with more training data. To wrap up, we point out various open problems that warrant deeper empirical and theoretical investigation. All in all, our review underscores that learning curves are surprisingly diverse and no universal model can be identified

    UNIVERSALITY OF SCALING: PERSPECTIVES IN ARTIFICIAL INTELLIGENCE AND PHYSICS

    Get PDF
    The presence of universal phenomena both hints towards deep underlying principles and can also serve as a tool to uncover them. Often, the scaling behavior of systems shows such universality. An example of this is artificial neural networks (ANNs), which are ubiquitously employed in artificial intelligence (AI) technology today. The performance of an ANN, measured by the loss LL, scales with the size of the network NN and with the quantity of training data DD as simple power laws in NN or DD. We explain these laws theoretically. Additionally, our theory also explains the persistence over many orders of magnitude of the scaling with model size. When both the amount of data DD and the model size NN are finite, the loss scales as LD1L\propto D^{-1} and LN1/2L\propto N^{-1/2}. The scaling in the regime where either NN or DD is effectively infinite is more non-trivial, being tied to the intrinsic dimension dd of the training dataset by the simple relations LN4/dL\propto N^{-4/d} and LD4/dL\propto D^{-4/d}. We test our theoretical predictions in a teacher/student framework, and on several datasets and with GPT-type language models. These measurements yield intrinsic dimensions for several image datasets and set bounds on the dimension of the English language that these were trained on. Scaling behaviors also act as a tool to probe fundamental phenomena in nature---in this case the theory of quantum gravity. We use holography to probe spacetime by using the physics on its boundary. Specifically, previous work has employed the scaling properties of operators on the boundary to construct a scalar field in the bulk. Our construction extends this procedure to allow for arbitrary choice of gravitational dressing of the field. Apart from yielding a more comprehensive understanding of the quantum properties of gravity, our construction is suitable to test the non-locality of quantum gravity
    corecore