248 research outputs found

    Reducing the Number of Training Cases in Genetic Programming

    Get PDF
    Zoppi, G., Vanneschi, L., & Giacobini, M. (2022). Reducing the Number of Training Cases in Genetic Programming. In 2022 IEEE Congress on Evolutionary Computation (CEC) (pp. 1-8). IEEE. https://doi.org/10.1109/CEC55065.2022.9870327In the field of Machine Learning, one of the most common and discussed questions is how to choose an adequate number of data observations, in order to train our models satisfactorily. In other words, find what is the right amount of data needed to create a model, that is neither underfitted nor overfitted, but instead is able to achieve a reasonable generalization ability. The problem grows in importance when we consider Genetic Programming, where fitness evaluation is often rather slow. Therefore, finding the minimum amount of data that enables us to discover the solution to a given problem could bring significant benefits. Using the notion of entropy in a dataset, we seek to understand the information gain obtainable from each additional data point. We then look for the smallest percentage of data that corresponds to enough information to yield satisfactory results. We present, as a first step, an example derived from the state of art. Then, we question a relevant part of our procedure and introduce two case studies to experimentally validate our theoretical hypothesis.authorsversionpublishe
    corecore