Search CORE

204 research outputs found

Evaluation of a Tree-based Pipeline Optimization Tool for Automating Data Science

Author: Banzhaf W.
Bergstra J.
Feurer M.
Hastie T. J.
Snoek J.
Urbanowicz R. J.
Publication venue
Publication date: 19/03/2016
Field of study

As the field of data science continues to grow, there will be an ever-increasing demand for tools that make machine learning accessible to non-experts. In this paper, we introduce the concept of tree-based pipeline optimization for automating one of the most tedious parts of machine learning---pipeline design. We implement an open source Tree-based Pipeline Optimization Tool (TPOT) in Python and demonstrate its effectiveness on a series of simulated and real-world benchmark data sets. In particular, we show that TPOT can design machine learning pipelines that provide a significant improvement over a basic machine learning analysis while requiring little to no input nor prior knowledge from the user. We also address the tendency for TPOT to design overly complex pipelines by integrating Pareto optimization, which produces compact pipelines without sacrificing classification accuracy. As such, this work represents an important step toward fully automating machine learning pipeline design.Comment: 8 pages, 5 figures, preprint to appear in GECCO 2016, edits not yet made from reviewer comment

arXiv.org e-Print Archive

Crossref

Scipedia

Lexicase selection in Learning Classifier Systems

Author: Christine Zarges Thomas Jansen
Gustafson Steven
Krawiec Krzysztof
McKay R. I.
Sastry K.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 10/07/2019
Field of study

The lexicase parent selection method selects parents by considering performance on individual data points in random order instead of using a fitness function based on an aggregated data accuracy. While the method has demonstrated promise in genetic programming and more recently in genetic algorithms, its applications in other forms of evolutionary machine learning have not been explored. In this paper, we investigate the use of lexicase parent selection in Learning Classifier Systems (LCS) and study its effect on classification problems in a supervised setting. We further introduce a new variant of lexicase selection, called batch-lexicase selection, which allows for the tuning of selection pressure. We compare the two lexicase selection methods with tournament and fitness proportionate selection methods on binary classification problems. We show that batch-lexicase selection results in the creation of more generic rules which is favorable for generalization on future data. We further show that batch-lexicase selection results in better generalization in situations of partial or missing data.Comment: Genetic and Evolutionary Computation Conference, 201

arXiv.org e-Print Archive

Crossref

Visualising the Search Landscape of the Triangle Program

Author: CM Reidys
DE Goldberg
F Daolio
G Ochoa
JH Holland
M Harman
Y Jia
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

High order mutation analysis of a software engineering benchmark, including schema and local optima networks, suggests program improvements may not be as hard to find as is often assumed. 1) Bit-wise genetic building blocks are not deceptive and can lead to all global optima. 2) There are many neutral networks, plateaux and local optima, nevertheless in most cases near the human written C source code there are hill climbing routes including neutral moves to solutions

Crossref

Stirling Online Research Repository (RIOXX)

UCL Discovery

Stirling Online Research Repository

Multi-objective optimisation with a sequence-based selection hyper-heuristic

Author: Keedwell EK
Walker DJ
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 29/06/2016
Field of study

Hyper-heuristics have been used widely to solve optimisation problems, often single-objective and discrete in nature. Herein, we extend a recently-proposed selection hyper-heuristic to the multiobjective domain and with it optimise continuous problems. The MOSSHH algorithm operates as a hidden Markov model, using transition probabilities to determine which low-level heuristic or sequence of heuristics should be applied next. By incorporating dominance into the transition probability update rule, and an elite archive of solutions, MOSSHH generates solutions to multi-objective problems that are competitive with bespoke multi-objective algorithms. When applied to test problems, it is able to find good approximations to the true Pareto front, and yields information about the type of low-level heuristics that it uses to solve the problem

Crossref

Open Research Exeter

CMA-ES with Learning Rate Adaptation: Can CMA-ES with Default Population Size Solve Multimodal and Noisy Problems?

Author: Akimoto Youhei
Nomura Masahiro
Ono Isao
Publication venue
Publication date: 14/09/2023
Field of study

The covariance matrix adaptation evolution strategy (CMA-ES) is one of the most successful methods for solving black-box continuous optimization problems. One practically useful aspect of the CMA-ES is that it can be used without hyperparameter tuning. However, the hyperparameter settings still have a considerable impact, especially for difficult tasks such as solving multimodal or noisy problems. In this study, we investigate whether the CMA-ES with default population size can solve multimodal and noisy problems. To perform this investigation, we develop a novel learning rate adaptation mechanism for the CMA-ES, such that the learning rate is adapted so as to maintain a constant signal-to-noise ratio. We investigate the behavior of the CMA-ES with the proposed learning rate adaptation mechanism through numerical experiments, and compare the results with those obtained for the CMA-ES with a fixed learning rate. The results demonstrate that, when the proposed learning rate adaptation is used, the CMA-ES with default population size works well on multimodal and/or noisy problems, without the need for extremely expensive learning rate tuning.Comment: Nominated for the best paper of GECCO'23 ENUM Track. We have corrected the error of Eq.(7

arXiv.org e-Print Archive

Communities of Local Optima as Funnels in Fitness Landscapes

Author: Ochoa G.
Rothlauf F.
van Dongen S.
Wright S.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2016
Field of study

We conduct an analysis of local optima networks extracted from fitness landscapes of the Kauffman NK model under iterated local search. Applying the Markov Cluster Algorithm for community detection to the local optima networks, we find that the landscapes consist of multiple clusters. This result complements recent findings in the literature that landscapes often decompose into multiple funnels, which increases their difficulty for iterated local search. Our results suggest that the number of clusters as well as the size of the cluster in which the global optimum is located are correlated to the search difficulty of landscapes. We conclude that clusters found by community detection in local optima networks offer a new way to characterize the multi-funnel structure of fitness landscapes

Crossref

Stirling Online Research Repository (RIOXX)

Stirling Online Research Repository