204 research outputs found

    Evaluation of a Tree-based Pipeline Optimization Tool for Automating Data Science

    Full text link
    As the field of data science continues to grow, there will be an ever-increasing demand for tools that make machine learning accessible to non-experts. In this paper, we introduce the concept of tree-based pipeline optimization for automating one of the most tedious parts of machine learning---pipeline design. We implement an open source Tree-based Pipeline Optimization Tool (TPOT) in Python and demonstrate its effectiveness on a series of simulated and real-world benchmark data sets. In particular, we show that TPOT can design machine learning pipelines that provide a significant improvement over a basic machine learning analysis while requiring little to no input nor prior knowledge from the user. We also address the tendency for TPOT to design overly complex pipelines by integrating Pareto optimization, which produces compact pipelines without sacrificing classification accuracy. As such, this work represents an important step toward fully automating machine learning pipeline design.Comment: 8 pages, 5 figures, preprint to appear in GECCO 2016, edits not yet made from reviewer comment

    Lexicase selection in Learning Classifier Systems

    Full text link
    The lexicase parent selection method selects parents by considering performance on individual data points in random order instead of using a fitness function based on an aggregated data accuracy. While the method has demonstrated promise in genetic programming and more recently in genetic algorithms, its applications in other forms of evolutionary machine learning have not been explored. In this paper, we investigate the use of lexicase parent selection in Learning Classifier Systems (LCS) and study its effect on classification problems in a supervised setting. We further introduce a new variant of lexicase selection, called batch-lexicase selection, which allows for the tuning of selection pressure. We compare the two lexicase selection methods with tournament and fitness proportionate selection methods on binary classification problems. We show that batch-lexicase selection results in the creation of more generic rules which is favorable for generalization on future data. We further show that batch-lexicase selection results in better generalization in situations of partial or missing data.Comment: Genetic and Evolutionary Computation Conference, 201

    Visualising the Search Landscape of the Triangle Program

    Get PDF
    High order mutation analysis of a software engineering benchmark, including schema and local optima networks, suggests program improvements may not be as hard to find as is often assumed. 1) Bit-wise genetic building blocks are not deceptive and can lead to all global optima. 2) There are many neutral networks, plateaux and local optima, nevertheless in most cases near the human written C source code there are hill climbing routes including neutral moves to solutions

    Multi-objective optimisation with a sequence-based selection hyper-heuristic

    Get PDF
    Hyper-heuristics have been used widely to solve optimisation problems, often single-objective and discrete in nature. Herein, we extend a recently-proposed selection hyper-heuristic to the multiobjective domain and with it optimise continuous problems. The MOSSHH algorithm operates as a hidden Markov model, using transition probabilities to determine which low-level heuristic or sequence of heuristics should be applied next. By incorporating dominance into the transition probability update rule, and an elite archive of solutions, MOSSHH generates solutions to multi-objective problems that are competitive with bespoke multi-objective algorithms. When applied to test problems, it is able to find good approximations to the true Pareto front, and yields information about the type of low-level heuristics that it uses to solve the problem

    CMA-ES with Learning Rate Adaptation: Can CMA-ES with Default Population Size Solve Multimodal and Noisy Problems?

    Full text link
    The covariance matrix adaptation evolution strategy (CMA-ES) is one of the most successful methods for solving black-box continuous optimization problems. One practically useful aspect of the CMA-ES is that it can be used without hyperparameter tuning. However, the hyperparameter settings still have a considerable impact, especially for difficult tasks such as solving multimodal or noisy problems. In this study, we investigate whether the CMA-ES with default population size can solve multimodal and noisy problems. To perform this investigation, we develop a novel learning rate adaptation mechanism for the CMA-ES, such that the learning rate is adapted so as to maintain a constant signal-to-noise ratio. We investigate the behavior of the CMA-ES with the proposed learning rate adaptation mechanism through numerical experiments, and compare the results with those obtained for the CMA-ES with a fixed learning rate. The results demonstrate that, when the proposed learning rate adaptation is used, the CMA-ES with default population size works well on multimodal and/or noisy problems, without the need for extremely expensive learning rate tuning.Comment: Nominated for the best paper of GECCO'23 ENUM Track. We have corrected the error of Eq.(7

    Communities of Local Optima as Funnels in Fitness Landscapes

    Get PDF
    We conduct an analysis of local optima networks extracted from fitness landscapes of the Kauffman NK model under iterated local search. Applying the Markov Cluster Algorithm for community detection to the local optima networks, we find that the landscapes consist of multiple clusters. This result complements recent findings in the literature that landscapes often decompose into multiple funnels, which increases their difficulty for iterated local search. Our results suggest that the number of clusters as well as the size of the cluster in which the global optimum is located are correlated to the search difficulty of landscapes. We conclude that clusters found by community detection in local optima networks offer a new way to characterize the multi-funnel structure of fitness landscapes
    • …
    corecore