4 research outputs found

    Prediction of expected performance for a genetic programming classifier

    Get PDF
    The estimation of problem difficulty is an open issue in genetic programming (GP). The goal of this work is to generate models that predict the expected performance of a GP-based classifier when it is applied to an unseen task. Classification problems are described using domain-specific features, some of which are proposed in this work, and these features are given as input to the predictive models. These models are referred to as predictors of expected performance. We extend this approach by using an ensemble of specialized predictors (SPEP), dividing classification problems into groups and choosing the corresponding SPEP. The proposed predictors are trained using 2D synthetic classification problems with balanced datasets. The models are then used to predict the performance of the GP classifier on unseen real-world datasets that are multidimensional and imbalanced. This work is the first to provide a performance prediction of a GP system on test data, while previous works focused on predicting training performance. Accurate predictive models are generated by posing a symbolic regression task and solving it with GP. These results are achieved by using highly descriptive features and including a dimensionality reduction stage that simplifies the learning and testing process. The proposed approach could be extended to other classification algorithms and used as the basis of an expert system for algorithm selection

    Artificial evolution, fractal analysis and applications

    Get PDF
    This document contains a selection of research works to which I have contributed. It is structured around two themes, artificial evolution and signal regularity analysis and consists of three main parts: Part I: Artificial evolution, Part II: Estimation of signal regularity and Part III: Applications, combination of signal processing, fractal analysis and artificial evolution. In order to set the context and explain the coherence of the rest of the document, this manuscript begins with an introduction, Chapter 1, providing a list of collaborators and of the research projects carried out. Theoretical contributions focus on two areas: evolutionary algorithms and the measurement of signal regularity and are presented in Part I and Part II respectively. These two themes are then exploited and applied to real problems in Part III. Part I, Artificial Evolution, consists of 8 chapters. Chapter 2 contains a brief presentation of various types of evolutionary algorithms (ge-netic algorithms, evolutionary strategies and genetic programming) and presents some contributions in this area, which will be detailed later in the document. Chapter 3, entitled Prediction of Expected Performance for a Genetic Programming Classifier proposes a method to predict the expected performance for a genetic programming (GP) classifier without having to run the program or sample potential solutions in the research space. For a given classification problem, a pre-processing step to simplify the feature extraction process is proposed. Then the step of extracting the characteristics of the problem is performed. Finally, a PEP (prediction of expected performance) model is used, which takes the characteristics of the problem as input and produces the predicted classification error on the test set as output. To build the PEP model, a supervised learning method with a GP is used. Then, to refine this work, an approach using several PEP models is developed, each now becoming a specialized predictors of expected performance (SPEP) specialized for a particular group of problems. It appears that the PEP and SPEP models were able to accurately predict the performance of a GP-classifier and that the SPEP approach gave the best results. Chapter 4, entitled A comparison of fitness-case sampling methods for genetic programming presents an extensive comparative study of four fitness-case sampling methods, namely: Interleaved Sampling, Random Interleaved Sampling, Lexicase Selection and the proposed Keep-Worst Interleaved Sampling. The algorithms are compared on 11 symbolic regression problems and 11 supervised classification problems, using 10 synthetic benchmarks and 12 real-world datasets. They are evaluated based on test performance, overfitting and average program size, comparing them with a standard GP search. The experimental results suggest that fitness-case sampling methods are particularly useful for difficult real-world symbolic regression problems, improving performance, reducing overfitting and limiting code growth. On the other hand, it seems that fitness-case sampling cannot improve upon GP performance when considering supervised binary classification. Chapter 5, entitled Evolving Genetic Programming Classifiers with Novelty Search, deals with a new and unique approach towards search and optimization, the Novelty Search (NS), where an explicit objective function is replaced by a measure of solution novelty. This chapter proposes a NS-based GP algorithm for supervised classification. Results show that NS can solve real-world classification tasks, the algorithm is validated on real-world benchmarks for binary and multiclass problems. Moreover, two new versions of the NS algorithm are proposed, Probabilistic NS (PNS) and a variant of Minimal Criteria NS (MCNS). The former models the behavior of each solution as a random vector and eliminates all of the original NS parameters while reducing the computational overhead of the NS algorithm. The latter uses a standard objective function to constrain and bias the search towards high performance solutions. This chapter also discusses the effects of NS on GP search dynam..

    Artificial evolution, fractal analysis and applications

    No full text
    This document contains a selection of research works to which I have contributed. It is structured around two themes, artificial evolution and signal regularity analysis and consists of three main parts: Part I: Artificial evolution, Part II: Estimation of signal regularity and Part III: Applications, combination of signal processing, fractal analysis and artificial evolution. In order to set the context and explain the coherence of the rest of the document, this manuscript begins with an introduction, Chapter 1, providing a list of collaborators and of the research projects carried out. Theoretical contributions focus on two areas: evolutionary algorithms and the measurement of signal regularity and are presented in Part I and Part II respectively. These two themes are then exploited and applied to real problems in Part III. Part I, Artificial Evolution, consists of 8 chapters. Chapter 2 contains a brief presentation of various types of evolutionary algorithms (ge-netic algorithms, evolutionary strategies and genetic programming) and presents some contributions in this area, which will be detailed later in the document. Chapter 3, entitled Prediction of Expected Performance for a Genetic Programming Classifier proposes a method to predict the expected performance for a genetic programming (GP) classifier without having to run the program or sample potential solutions in the research space. For a given classification problem, a pre-processing step to simplify the feature extraction process is proposed. Then the step of extracting the characteristics of the problem is performed. Finally, a PEP (prediction of expected performance) model is used, which takes the characteristics of the problem as input and produces the predicted classification error on the test set as output. To build the PEP model, a supervised learning method with a GP is used. Then, to refine this work, an approach using several PEP models is developed, each now becoming a specialized predictors of expected performance (SPEP) specialized for a particular group of problems. It appears that the PEP and SPEP models were able to accurately predict the performance of a GP-classifier and that the SPEP approach gave the best results. Chapter 4, entitled A comparison of fitness-case sampling methods for genetic programming presents an extensive comparative study of four fitness-case sampling methods, namely: Interleaved Sampling, Random Interleaved Sampling, Lexicase Selection and the proposed Keep-Worst Interleaved Sampling. The algorithms are compared on 11 symbolic regression problems and 11 supervised classification problems, using 10 synthetic benchmarks and 12 real-world datasets. They are evaluated based on test performance, overfitting and average program size, comparing them with a standard GP search. The experimental results suggest that fitness-case sampling methods are particularly useful for difficult real-world symbolic regression problems, improving performance, reducing overfitting and limiting code growth. On the other hand, it seems that fitness-case sampling cannot improve upon GP performance when considering supervised binary classification. Chapter 5, entitled Evolving Genetic Programming Classifiers with Novelty Search, deals with a new and unique approach towards search and optimization, the Novelty Search (NS), where an explicit objective function is replaced by a measure of solution novelty. This chapter proposes a NS-based GP algorithm for supervised classification. Results show that NS can solve real-world classification tasks, the algorithm is validated on real-world benchmarks for binary and multiclass problems. Moreover, two new versions of the NS algorithm are proposed, Probabilistic NS (PNS) and a variant of Minimal Criteria NS (MCNS). The former models the behavior of each solution as a random vector and eliminates all of the original NS parameters while reducing the computational overhead of the NS algorithm. The latter uses a standard objective function to constrain and bias the search towards high performance solutions. This chapter also discusses the effects of NS on GP search dynam..
    corecore