11,316 research outputs found

    Fitness Landscape Analysis of Automated Machine Learning Search Spaces

    Get PDF
    The field of Automated Machine Learning (AutoML) has as its main goal to automate the process of creating complete Machine Learning (ML) pipelines to any dataset without requiring deep user expertise in ML. Several AutoML methods have been proposed so far, but there is not a single one that really stands out. Furthermore, there is a lack of studies on the characteristics of the fitness landscape of AutoML search spaces. Such analysis may help to understand the performance of different optimization methods for AutoML and how to improve them. This paper adapts classic fitness landscape analysis measures to the context of AutoML. This is a challenging task, as AutoML search spaces include discrete, continuous, categorical and conditional hyperparameters. We propose an ML pipeline representation, a neighborhood definition and a distance metric between pipelines, and use them in the evaluation of the fitness distance correlation (FDC) and the neutrality ratio for a given AutoML search space. Results of FDC are counter-intuitive and require a more in-depth analysis of a range of search spaces. Results of neutrality, in turn, show a strong positive correlation between the mean neutrality ratio and the fitness value

    Exploratory Landscape Analysis for Mixed-Variable Problems

    Get PDF
    Exploratory landscape analysis and fitness landscape analysis in general have been pivotal in facilitating problem understanding, algorithm design and endeavors such as automated algorithm selection and configuration. These techniques have largely been limited to search spaces of a single domain. In this work, we provide the means to compute exploratory landscape features for mixed-variable problems where the decision space is a mixture of continuous, binary, integer, and categorical variables. This is achieved by utilizing existing encoding techniques originating from machine learning. We provide a comprehensive juxtaposition of the results based on these different techniques. To further highlight their merit for practical applications, we design and conduct an automated algorithm selection study based on a hyperparameter optimization benchmark suite. We derive a meaningful compartmentalization of these benchmark problems by clustering based on the used landscape features. The identified clusters mimic the behavior the used algorithms exhibit. Meaning, the different clusters have different best performing algorithms. Finally, our trained algorithm selector is able to close the gap between the single best and the virtual best solver by 57.5% over all benchmark problems

    Hpo X Ela:Investigating Hyperparameter Optimization Landscapes by Means of Exploratory Landscape Analysis

    Get PDF
    Hyperparameter optimization (HPO) is a key component of machine learning models for achieving peak predictive performance. While numerous methods and algorithms for HPO have been proposed over the last years, little progress has been made in illuminating and examining the actual structure of these black-box optimization problems. Exploratory landscape analysis (ELA) subsumes a set of techniques that can be used to gain knowledge about properties of unknown optimization problems. In this paper, we evaluate the performance of five different black-box optimizers on 30 HPO problems, which consist of two-, three- and five-dimensional continuous search spaces of the XGBoost learner trained on 10 different data sets. This is contrasted with the performance of the same optimizers evaluated on 360 problem instances from the black-box optimization benchmark (BBOB). We then compute ELA features on the HPO and BBOB problems and examine similarities and differences. A cluster analysis of the HPO and BBOB problems in ELA feature space allows us to identify how the HPO problems compare to the BBOB problems on a structural meta-level. We identify a subset of BBOB problems that are close to the HPO problems in ELA feature space and show that optimizer performance is comparably similar on these two sets of benchmark problems. We highlight open challenges of ELA for HPO and discuss potential directions of future research and applications

    HPO × ELA:Investigating Hyperparameter Optimization Landscapes by Means of Exploratory Landscape Analysis

    Get PDF
    Hyperparameter optimization (HPO) is a key component of machine learning models for achieving peak predictive performance. While numerous methods and algorithms for HPO have been proposed over the last years, little progress has been made in illuminating and examining the actual structure of these black-box optimization problems. Exploratory landscape analysis (ELA) subsumes a set of techniques that can be used to gain knowledge about properties of unknown optimization problems. In this paper, we evaluate the performance of five different black-box optimizers on 30 HPO problems, which consist of two-, three- and five-dimensional continuous search spaces of the XGBoost learner trained on 10 different data sets. This is contrasted with the performance of the same optimizers evaluated on 360 problem instances from the black-box optimization benchmark (BBOB). We then compute ELA features on the HPO and BBOB problems and examine similarities and differences. A cluster analysis of the HPO and BBOB problems in ELA feature space allows us to identify how the HPO problems compare to the BBOB problems on a structural meta-level. We identify a subset of BBOB problems that are close to the HPO problems in ELA feature space and show that optimizer performance is comparably similar on these two sets of benchmark problems. We highlight open challenges of ELA for HPO and discuss potential directions of future research and applications.</p

    Constructing Search Spaces for Search-Based Software Testing Using Neural Networks

    Get PDF
    A central requirement for any Search-Based Software Testing (SBST) technique is a convenient and meaningful fitness landscape. Whether one follows a targeted or a diversification driven strategy, a search landscape needs to be large, continuous, easy to construct and representative of the underlying property of interest. Constructing such a landscape is not a trivial task often requiring a significant manual effort by an expert. We present an approach for constructing meaningful and convenient fitness landscapes using neural networks (NN) – for targeted and diversification strategies alike. We suggest that output of an NN predictor can be interpreted as a fitness for a targeted strategy. The NN is trained on a corpus of execution traces and various properties of interest, prior to searching. During search, the trained NN is queried to predict an estimate of a property given an execution trace. The outputs of the NN form a convenient search space which is strongly representative of a number of properties. We believe that such a search space can be readily used for driving a search towards specific properties of interest. For a diversification strategy, we propose the use of an autoencoder; a mechanism for compacting data into an n-dimensional “latent” space. In it, datapoints are arranged according to the similarity of their salient features. We show that a latent space of execution traces possesses characteristics of a convenient search landscape: it is continuous, large and crucially, it defines a notion of similarity to arbitrary observations

    Quantifying the Impact of Parameter Tuning on Nature-Inspired Algorithms

    Full text link
    The problem of parameterization is often central to the effective deployment of nature-inspired algorithms. However, finding the optimal set of parameter values for a combination of problem instance and solution method is highly challenging, and few concrete guidelines exist on how and when such tuning may be performed. Previous work tends to either focus on a specific algorithm or use benchmark problems, and both of these restrictions limit the applicability of any findings. Here, we examine a number of different algorithms, and study them in a "problem agnostic" fashion (i.e., one that is not tied to specific instances) by considering their performance on fitness landscapes with varying characteristics. Using this approach, we make a number of observations on which algorithms may (or may not) benefit from tuning, and in which specific circumstances.Comment: 8 pages, 7 figures. Accepted at the European Conference on Artificial Life (ECAL) 2013, Taormina, Ital
    corecore