924 research outputs found
EXPObench: Benchmarking Surrogate-based Optimisation Algorithms on Expensive Black-box Functions
Surrogate algorithms such as Bayesian optimisation are especially designed
for black-box optimisation problems with expensive objectives, such as
hyperparameter tuning or simulation-based optimisation. In the literature,
these algorithms are usually evaluated with synthetic benchmarks which are well
established but have no expensive objective, and only on one or two real-life
applications which vary wildly between papers. There is a clear lack of
standardisation when it comes to benchmarking surrogate algorithms on
real-life, expensive, black-box objective functions. This makes it very
difficult to draw conclusions on the effect of algorithmic contributions. A new
benchmark library, EXPObench, provides first steps towards such a
standardisation. The library is used to provide an extensive comparison of six
different surrogate algorithms on four expensive optimisation problems from
different real-life applications. This has led to new insights regarding the
relative importance of exploration, the evaluation time of the objective, and
the used model. A further contribution is that we make the algorithms and
benchmark problem instances publicly available, contributing to more uniform
analysis of surrogate algorithms. Most importantly, we include the performance
of the six algorithms on all evaluated problem instances. This results in a
unique new dataset that lowers the bar for researching new methods as the
number of expensive evaluations required for comparison is significantly
reduced.Comment: 13 page
Framework and Benchmarks for Combinatorial and Mixed-variable Bayesian Optimization
This paper introduces a modular framework for Mixed-variable and
Combinatorial Bayesian Optimization (MCBO) to address the lack of systematic
benchmarking and standardized evaluation in the field. Current MCBO papers
often introduce non-diverse or non-standard benchmarks to evaluate their
methods, impeding the proper assessment of different MCBO primitives and their
combinations. Additionally, papers introducing a solution for a single MCBO
primitive often omit benchmarking against baselines that utilize the same
methods for the remaining primitives. This omission is primarily due to the
significant implementation overhead involved, resulting in a lack of controlled
assessments and an inability to showcase the merits of a contribution
effectively. To overcome these challenges, our proposed framework enables an
effortless combination of Bayesian Optimization components, and provides a
diverse set of synthetic and real-world benchmarking tasks. Leveraging this
flexibility, we implement 47 novel MCBO algorithms and benchmark them against
seven existing MCBO solvers and five standard black-box optimization algorithms
on ten tasks, conducting over 4000 experiments. Our findings reveal a superior
combination of MCBO primitives outperforming existing approaches and illustrate
the significance of model fit and the use of a trust region. We make our MCBO
library available under the MIT license at
\url{https://github.com/huawei-noah/HEBO/tree/master/MCBO}
Machine Learning for Fluid Mechanics
The field of fluid mechanics is rapidly advancing, driven by unprecedented
volumes of data from field measurements, experiments and large-scale
simulations at multiple spatiotemporal scales. Machine learning offers a wealth
of techniques to extract information from data that could be translated into
knowledge about the underlying fluid mechanics. Moreover, machine learning
algorithms can augment domain knowledge and automate tasks related to flow
control and optimization. This article presents an overview of past history,
current developments, and emerging opportunities of machine learning for fluid
mechanics. It outlines fundamental machine learning methodologies and discusses
their uses for understanding, modeling, optimizing, and controlling fluid
flows. The strengths and limitations of these methods are addressed from the
perspective of scientific inquiry that considers data as an inherent part of
modeling, experimentation, and simulation. Machine learning provides a powerful
information processing framework that can enrich, and possibly even transform,
current lines of fluid mechanics research and industrial applications.Comment: To appear in the Annual Reviews of Fluid Mechanics, 202
A new Taxonomy of Continuous Global Optimization Algorithms
Surrogate-based optimization, nature-inspired metaheuristics, and hybrid
combinations have become state of the art in algorithm design for solving
real-world optimization problems. Still, it is difficult for practitioners to
get an overview that explains their advantages in comparison to a large number
of available methods in the scope of optimization. Available taxonomies lack
the embedding of current approaches in the larger context of this broad field.
This article presents a taxonomy of the field, which explores and matches
algorithm strategies by extracting similarities and differences in their search
strategies. A particular focus lies on algorithms using surrogates,
nature-inspired designs, and those created by design optimization. The
extracted features of components or operators allow us to create a set of
classification indicators to distinguish between a small number of classes. The
features allow a deeper understanding of components of the search strategies
and further indicate the close connections between the different algorithm
designs. We present intuitive analogies to explain the basic principles of the
search algorithms, particularly useful for novices in this research field.
Furthermore, this taxonomy allows recommendations for the applicability of the
corresponding algorithms.Comment: 35 pages total, 28 written pages, 4 figures, 2019 Reworked Versio
Hyperparameter optimization: Foundations, algorithms, best practices, and open challenges
Most machine learning algorithms are configured by a set of hyperparameters whose values must be carefully chosen and which often considerably impact performance. To avoid a time-consuming and irreproducible manual process of trial-and-error to find well-performing hyperparameter configurations, various automatic hyperparameter optimization (HPO) methods—for example, based on resampling error estimation for supervised machine learning—can be employed. After introducing HPO from a general perspective, this paper reviews important HPO methods, from simple techniques such as grid or random search to more advanced methods like evolution strategies, Bayesian optimization, Hyperband, and racing. This work gives practical recommendations regarding important choices to be made when conducting HPO, including the HPO algorithms themselves, performance evaluation, how to combine HPO with machine learning pipelines, runtime improvements, and parallelization. This article is categorized under: Algorithmic Development > Statistics Technologies > Machine Learning Technologies > Prediction
Landscape Analysis for Surrogate Models in the Evolutionary Black-Box Context
Surrogate modeling has become a valuable technique for black-box optimization
tasks with expensive evaluation of the objective function. In this paper, we
investigate the relationship between the predictive accuracy of surrogate
models and features of the black-box function landscape. We also study
properties of features for landscape analysis in the context of different
transformations and ways of selecting the input data. We perform the landscape
analysis of a large set of data generated using runs of a surrogate-assisted
version of the Covariance Matrix Adaptation Evolution Strategy on the noiseless
part of the Comparing Continuous Optimisers benchmark function testbed.Comment: 25 pages main article, 28 pages supplementary material, 3 figures,
currently under review at Evolutionary Computation journa
Gradient boosting in automatic machine learning: feature selection and hyperparameter optimization
Das Ziel des automatischen maschinellen Lernens (AutoML) ist es, alle Aspekte der Modellwahl in prädiktiver Modellierung zu automatisieren. Diese Arbeit beschäftigt sich mit Gradienten Boosting im Kontext von AutoML mit einem Fokus auf Gradient Tree Boosting und komponentenweisem Boosting. Beide Techniken haben eine gemeinsame Methodik, aber ihre Zielsetzung ist unterschiedlich. Während Gradient Tree Boosting im maschinellen Lernen als leistungsfähiger Vorhersagealgorithmus weit verbreitet ist, wurde komponentenweises Boosting im Rahmen der Modellierung hochdimensionaler Daten entwickelt. Erweiterungen des komponentenweisen Boostings auf multidimensionale Vorhersagefunktionen werden in dieser Arbeit ebenfalls untersucht. Die Herausforderung der Hyperparameteroptimierung wird mit Fokus auf Bayesianische Optimierung und effiziente Stopping-Strategien diskutiert. Ein groß angelegter Benchmark über Hyperparameter verschiedener Lernalgorithmen, zeigt den kritischen Einfluss von Hyperparameter Konfigurationen auf die Qualität der Modelle. Diese Daten können als Grundlage für neue AutoML- und Meta-Lernansätze verwendet werden. Darüber hinaus werden fortgeschrittene Strategien zur Variablenselektion zusammengefasst und eine neue Methode auf Basis von permutierten Variablen vorgestellt. Schließlich wird ein AutoML-Ansatz vorgeschlagen, der auf den Ergebnissen und Best Practices für die Variablenselektion und Hyperparameteroptimierung basiert. Ziel ist es AutoML zu vereinfachen und zu stabilisieren sowie eine hohe Vorhersagegenauigkeit zu gewährleisten. Dieser Ansatz wird mit AutoML-Methoden, die wesentlich komplexere Suchräume und Ensembling Techniken besitzen, verglichen.
Vier Softwarepakete für die statistische Programmiersprache R sind Teil dieser Arbeit, die neu entwickelt oder erweitert wurden: mlrMBO: Ein generisches Paket für die Bayesianische Optimierung; autoxgboost: Ein AutoML System, das sich vollständig auf Gradient Tree Boosting fokusiert; compboost: Ein modulares, in C++ geschriebenes Framework für komponentenweises Boosting; gamboostLSS: Ein Framework für komponentenweises Boosting additiver Modelle für Location, Scale und Shape.The goal of automatic machine learning (AutoML) is to automate all aspects of model selection in (supervised) predictive modeling. This thesis deals with gradient boosting techniques in the context of AutoML with a focus on gradient tree boosting and component-wise gradient boosting. Both techniques have a common methodology, but their goal is quite different. While gradient tree boosting is widely used in machine learning as a powerful prediction algorithm, component-wise gradient boosting strength is in feature selection and modeling of high-dimensional data. Extensions of component-wise gradient boosting to multidimensional prediction functions are considered as well. Focusing on Bayesian optimization and efficient early stopping strategies the challenge of hyperparameter optimization for these algorithms is discussed. Difficulty in the optimization of these algorithms is shown by a large scale random search on hyperparameters for machine learning algorithms, that can build the foundation of new AutoML and metalearning approaches. Furthermore, advanced feature selection strategies are summarized and a new method based on shadow features is introduced. Finally, an AutoML approach based on the results and best practices for feature selection and hyperparameter optimization is proposed, with the goal of simplifying and stabilizing AutoML while maintaining high prediction accuracy. This is compared to AutoML approaches using much more complex search spaces and ensembling techniques.
Four software packages for the statistical programming language R have been newly developed or extended as a part of this thesis: mlrMBO: A general framework for Bayesian optimization; autoxgboost: An automatic machine learning framework that heavily utilizes gradient tree boosting; compboost: A modular framework for component-wise boosting written in C++; gamboostLSS: A framework for component-wise boosting for generalized additive models for location scale and shape
- …