9 research outputs found

    Scalable Meta-Learning for Bayesian Optimization

    Full text link
    Bayesian optimization has become a standard technique for hyperparameter optimization, including data-intensive models such as deep neural networks that may take days or weeks to train. We consider the setting where previous optimization runs are available, and we wish to use their results to warm-start a new optimization run. We develop an ensemble model that can incorporate the results of past optimization runs, while avoiding the poor scaling that comes with putting all results into a single Gaussian process model. The ensemble combines models from past runs according to estimates of their generalization performance on the current optimization. Results from a large collection of hyperparameter optimization benchmark problems and from optimization of a production computer vision platform at Facebook show that the ensemble can substantially reduce the time it takes to obtain near-optimal configurations, and is useful for warm-starting expensive searches or running quick re-optimizations

    Interpretable Neural Architecture Search via Bayesian Optimisation with Weisfeiler-Lehman Kernels

    Get PDF
    Current neural architecture search (NAS) strategies focus only on finding a single, good, architecture. They offer little insight into why a specific network is performing well, or how we should modify the architecture if we want further improvements. We propose a Bayesian optimisation (BO) approach for NAS that combines the Weisfeiler-Lehman graph kernel with a Gaussian process surrogate. Our method optimises the architecture in a highly data-efficient manner: it is capable of capturing the topological structures of the architectures and is scalable to large graphs, thus making the high-dimensional and graph-like search spaces amenable to BO. More importantly, our method affords interpretability by discovering useful network features and their corresponding impact on the network performance. Indeed, we demonstrate empirically that our surrogate model is capable of identifying useful motifs which can guide the generation of new architectures. We finally show that our method outperforms existing NAS approaches to achieve the state of the art on both closed- and open-domain search spaces.Comment: ICLR 2021. 9 pages, 5 figures, 1 table (23 pages, 14 figures and 3 tables including references and appendices

    Transfer Learning for Multi-surrogate-model Optimization

    Get PDF
    Surrogate-model-based optimization is widely used to solve black-box optimization problems if the evaluation of a target system is expensive. However, when the optimization budget is limited to a single or several evaluations, surrogate-model-based optimization may not perform well due to the lack of knowledge about the search space. In this case, transfer learning helps to get a good optimization result due to the usage of experience from the previous optimization runs. And if the budget is not strictly limited, transfer learning is capable of improving the final results of black-box optimization. The recent work in surrogate-model-based optimization showed that using multiple surrogates (i.e., applying multi-surrogate-model optimization) can be extremely efficient in complex search spaces. The main assumption of this thesis suggests that transfer learning can further improve the quality of multi-surrogate-model optimization. However, to the best of our knowledge, there exist no approaches to transfer learning in the multi-surrogate-model context yet. In this thesis, we propose an approach to transfer learning for multi-surrogate-model optimization. It encompasses an improved method of defining the expediency of knowledge transfer, adapted multi-surrogate-model recommendation, multi-task learning parameter tuning, and few-shot learning techniques. We evaluated the proposed approach with a set of algorithm selection and parameter setting problems, comprising mathematical functions optimization and the traveling salesman problem, as well as random forest hyperparameter tuning over OpenML datasets. The evaluation shows that the proposed approach helps to improve the quality delivered by multi-surrogate-model optimization and ensures getting good optimization results even under a strictly limited budget.:1 Introduction 1.1 Motivation 1.2 Research objective 1.3 Solution overview 1.4 Thesis structure 2 Background 2.1 Optimization problems 2.2 From single- to multi-surrogate-model optimization 2.2.1 Classical surrogate-model-based optimization 2.2.2 The purpose of multi-surrogate-model optimization 2.2.3 BRISE 2.5.0: Multi-surrogate-model-based software product line for parameter tuning 2.3 Transfer learning 2.3.1 Definition and purpose of transfer learning 2.4 Summary of the Background 3 Related work 3.1 Questions to transfer learning 3.2 When to transfer: Existing approaches to determining the expediency of knowledge transfer 3.2.1 Meta-features-based approaches 3.2.2 Surrogate-model-based similarity 3.2.3 Relative landmarks-based approaches 3.2.4 Sampling landmarks-based approaches 3.2.5 Similarity threshold problem 3.3 What to transfer: Existing approaches to knowledge transfer 3.3.1 Ensemble learning 3.3.2 Search space pruning 3.3.3 Multi-task learning 3.3.4 Surrogate model recommendation 3.3.5 Few-shot learning 3.3.6 Other approaches to transferring knowledge 3.4 How to transfer (discussion): Peculiarities and required design decisions for the TL implementation in multi-surrogate-model setup 3.4.1 Peculiarities of model recommendation in multi-surrogate-model setup 3.4.2 Required design decisions in multi-task learning 3.4.3 Few-shot learning problem 3.5 Summary of the related work analysis 4 Transfer learning for multi-surrogate-model optimization 4.1 Expediency of knowledge transfer 4.1.1 Experiments’ similarity definition as a variability point 4.1.2 Clustering to filter the most suitable experiments 4.2 Dynamic model recommendation in multi-surrogate-model setup 4.2.1 Variable recommendation granularity 4.2.2 Model recommendation by time and performance criteria 4.3 Multi-task learning 4.4 Implementation of the proposed concept 4.5 Conclusion of the proposed concept 5 Evaluation 5.1 Benchmark suite 5.1.1 APSP for the meta-heuristics 5.1.2 Hyperparameter optimization of the Random Forest algorithm 5.2 Environment setup 5.3 Evaluation plan 5.4 Baseline evaluation 5.5 Meta-tuning for a multi-task learning approach 5.5.1 Revealing the dependencies between the parameters of multi-task learning and its performance 5.5.2 Multi-task learning performance with the best found parameters 5.6 Expediency determination approach 5.6.1 Expediency determination as a variability point 5.6.2 Flexible number of the most similar experiments with the help of clustering 5.6.3 Influence of the number of initial samples on the quality of expediency determination 5.7 Multi-surrogate-model recommendation 5.8 Few-shot learning 5.8.1 Transfer of the built surrogate models’ combination 5.8.2 Transfer of the best configuration 5.8.3 Transfer from different experiment instances 5.9 Summary of the evaluation results 6 Conclusion and Future wor

    Automated Machine Learning - Bayesian Optimization, Meta-Learning & Applications

    Get PDF
    Automating machine learning by providing techniques that autonomously find the best algorithm, hyperparameter configuration and preprocessing is helpful for both researchers and practitioners. Therefore, it is not surprising that automated machine learning has become a very interesting field of research. Bayesian optimization has proven to be a very successful tool for automated machine learning. In the first part of the thesis we present different approaches to improve Bayesian optimization by means of transfer learning. We present three different ways of considering meta-knowledge in Bayesian optimization, i.e. search space pruning, initialization and transfer surrogate models. Finally, we present a general framework for Bayesian optimization combined with meta-learning and conduct a comparison among existing work on two different meta-data sets. A conclusion is that in particular the meta-target driven approaches provide better results. Choosing algorithm configurations based on the improvement on the meta-knowledge combined with the expected improvement yields best results. The second part of this thesis is more application-oriented. Bayesian optimization is applied to large data sets and used as a tool to participate in machine learning challenges. We compare its autonomous performance and its performance in combination with a human expert. At two ECML-PKDD Discovery Challenges, we are able to show that automated machine learning outperforms human machine learning experts. Finally, we present an approach that automates the process of creating an ensemble of several layers, different algorithms and hyperparameter configurations. These kinds of ensembles are jokingly called Frankenstein ensembles and proved their benefit on versatile data sets in many machine learning challenges. We compare our approach Automatic Frankensteining with the current state of the art for automated machine learning on 80 different data sets and can show that it outperforms them on the majority using the same training time. Furthermore, we compare Automatic Frankensteining on a large-scale data set to more than 3,500 machine learning expert teams and are able to outperform more than 3,000 of them within 12 CPU hours.Die Automatisierung des Maschinellen Lernens erlaubt es ohne menschliche Mitwirkung den besten Algorithmus, die dazugehörige beste Konfiguration und die optimale Vorverarbeitung des Datensatzes zu bestimmen und ist daher hilfreich für Anwender mit und ohne fachlichen Hintergrund. Aus diesem Grund ist es wenig überraschend, dass die Automatisierung des Maschinellen Lernens zu einem populären Forschungsgebiet aufgestiegen ist. Bayessche Optimierung hat sich als eins der erfolgreicheren Werkzeuge für das automatisierte Maschinelle Lernen hervorgetan. Im ersten Teil dieser Arbeit werden verschiedene Methoden vorge-stellt, die Bayessche Optimierung mittels Lerntransfer auch über Probleme hinweg verbessern kann. Es werden drei Möglichkeiten vorgestellt, um Wissen von zuvor adressierten Problemen auf neue zu Übertragen: Suchraumreduzierung, Initialisierung und transferierende Ersatzmodelle. Schließlich wird ein allgemeines Framework für Bayessche Optimierung beschrieben, welches existierende Meta-lernansätze berücksichtigt und mit schon existierenden Arbeiten auf zwei Meta-Datensätzen verglichen. Die beschriebenen Ansätze, die direkt die Meta-Zielfunktion optimieren, liefern tendenziell bessere Ergebnisse. Die Wahl der Algorithmuskonfiguration basierend auf Meta-Wissen kombiniert mit der zu erwartenen Verbesserung erweist sich als beste Methode. Der zweite Teil der Arbeit ist anwendungsorientierter. Bayessche Optimierung wird im Rahmen von Wettbewerben auf großen Datensätzen angewandt, um Algorithmen des Maschinellen Lernens zu optimieren. Es wird sowohl die eigenständige Leistung der automatisierten Methode als auch die Leistung in Kombination mit einem menschlichen Experten bewertet. Durch die Teilnahme an zwei ECML-PKDD Wettbewerben wird gezeigt, dass das automatisierte Verfahren menschliche Konkurrenten übertreffen kann. Abschließend wird eine Methode vorgestellt, die automatisch ein mehrschichtiges Ensemble erstellt, welches aus verschiedenen Algorithmen und entsprechenden Konfigurationen besteht. In der Vergangenheit hat sich gezeigt, dass diese Art von Ensemble die besten Vorhersagen liefern kann. Die beschriebende Methode zur automatisierten Erstellung dieser Ensemble wird mit Hilfe von 80 Datensätzen mit existierenden Konkurrenzansätzen verglichen und erreicht innerhalb derselben Zeit auf der Mehrzahl der Datensätze bessere Ergebnisse. Diese Methode wird zusätzlich mit 3.500 Teams von Experten des Maschinellen Lernens auf einem größeren Datensatz verglichen. Es zeigt sich, dass die automatisierte Methodik schon innerhalb von 12 CPU Stunden bessere Ergebnisse liefert als 3.000 der menschlichen Teilnehmer des Wettbewerbs
    corecore