Search CORE

5,553 research outputs found

Hyperparameters, tuning and meta-learning for random forest and other machine learning algorithms

Author: Probst Philipp
Publication venue: Ludwig-Maximilians-Universität München
Publication date: 25/07/2019
Field of study

In this cumulative dissertation thesis, I examine the influence of hyperparameters on machine learning algorithms, with a special focus on random forest. It mainly consists of three papers that were written in the last three years. The first paper (Probst and Boulesteix, 2018) examines the influence of the number of trees on the performance of a random forest. In general it is believed that the number of trees should be set higher to achieve better performance. However, we show some real data examples in which the expectation of measures such as accuracy and AUC (partially) decrease with growing numbers of trees. We prove theoretically why this can happen and argue that this only happens in very special data situations. For other measures such as the Brier score, the logarithmic loss or the mean squared error, we show that this cannot happen. In a benchmark study based on 306 classification and regression datasets, we illustrate the extent of this unexpected behaviour. We observe that, on average, most of the improvement regarding performance can be achieved while growing the first 100 trees. We use our new OOBCurve R package (Probst, 2017a) for the analysis, which can be used to examine performances for a growing number of trees of a random forest based on the out-of-bag observations. The second paper (Probst et al., 2019b) is a more general work. Firstly we review literature about the influence of hyperparameters on random forest. The different hyperparameters considered are the number of variables drawn at each split, the sampling scheme for drawing observations for each tree, the minimum number of observations in a node that a tree is allowed to have, the number of trees and the splitting rule. Their influence is examined regarding performance, runtime and variable importance. In the second part of the paper different tuning strategies for obtaining optimal hyperparameters are presented. A new software package in R is introduced, tuneRanger. It executes the tuning strategy sequential model-based optimization based on the out-of-bag observations. The hyperparameters and ranges for tuning are chosen automatically. In a benchmark study this implementation is compared with other different implementations that execute tuning for random forest. The third paper (Probst et al., 2019a) is even more general and presents a general framework for examining the tunability of hyperparameters of machine learning algorithms. It first defines the concept of defaults properly and proposes definitions for measuring the tunability of the whole algorithm, of single hyperparameters and of combinations of hyperparameters. To apply these definitions to a collection of 38 binary classification datasets, a random bot is created, which generated in total around 5 million experiment runs of 6 algorithms with different hyperparameters. The details of this bot are described in an extra paper (Kühn et al., 2018), co-authored by myself, that is also included in this dissertation. The results of this bot are used to estimate the tunability of these 6 algorithms and their specific hyperparameters. Furthermore, ranges for parameter tuning of these algorithms are proposed

Digitale Hochschulschriften der LMU

Learning Multiple Defaults for Machine Learning Algorithms

Author: Bischl Bernd
Müller Andreas
Pfisterer Florian
Probst Philipp
van Rijn Jan N.
Publication venue
Publication date: 01/01/2021
Field of study

The performance of modern machine learning methods highly depends on their hyperparameter configurations. One simple way of selecting a configuration is to use default settings, often proposed along with the publication and implementation of a new algorithm. Those default values are usually chosen in an ad-hoc manner to work good enough on a wide variety of datasets. To address this problem, different automatic hyperparameter configuration algorithms have been proposed, which select an optimal configuration per dataset. This principled approach usually improves performance, but adds additional algorithmic complexity and computational costs to the training procedure. As an alternative to this, we propose learning a set of complementary default values from a large database of prior empirical results. Selecting an appropriate configuration on a new dataset then requires only a simple, efficient and embarrassingly parallel search over this set. We demonstrate the effectiveness and efficiency of the approach we propose in comparison to random search and Bayesian Optimization

arXiv.org e-Print Archive

Leiden University Scholary Publications

SHADHO: Massively Scalable Hardware-Aware Distributed Hyperparameter Optimization

Author: Kinnison Jeff
Kremer-Herman Nathaniel
Scheirer Walter
Thain Douglas
Publication venue
Publication date: 22/01/2018
Field of study

Computer vision is experiencing an AI renaissance, in which machine learning models are expediting important breakthroughs in academic research and commercial applications. Effectively training these models, however, is not trivial due in part to hyperparameters: user-configured values that control a model's ability to learn from data. Existing hyperparameter optimization methods are highly parallel but make no effort to balance the search across heterogeneous hardware or to prioritize searching high-impact spaces. In this paper, we introduce a framework for massively Scalable Hardware-Aware Distributed Hyperparameter Optimization (SHADHO). Our framework calculates the relative complexity of each search space and monitors performance on the learning task over all trials. These metrics are then used as heuristics to assign hyperparameters to distributed workers based on their hardware. We first demonstrate that our framework achieves double the throughput of a standard distributed hyperparameter optimization framework by optimizing SVM for MNIST using 150 distributed workers. We then conduct model search with SHADHO over the course of one week using 74 GPUs across two compute clusters to optimize U-Net for a cell segmentation task, discovering 515 models that achieve a lower validation loss than standard U-Net.Comment: 10 pages, 6 figure

arXiv.org e-Print Archive

Crossref