Search CORE

34,923 research outputs found

Auto-WEKA: Combined Selection and Hyperparameter Optimization of Classification Algorithms

Author: Hoos Holger H.
Hutter Frank
Leyton-Brown Kevin
Thornton Chris
Publication venue
Publication date: 01/01/2012
Field of study

Many different machine learning algorithms exist; taking into account each algorithm's hyperparameters, there is a staggeringly large number of possible alternatives overall. We consider the problem of simultaneously selecting a learning algorithm and setting its hyperparameters, going beyond previous work that addresses these issues in isolation. We show that this problem can be addressed by a fully automated approach, leveraging recent innovations in Bayesian optimization. Specifically, we consider a wide range of feature selection techniques (combining 3 search and 8 evaluator methods) and all classification approaches implemented in WEKA, spanning 2 ensemble methods, 10 meta-methods, 27 base classifiers, and hyperparameter settings for each classifier. On each of 21 popular datasets from the UCI repository, the KDD Cup 09, variants of the MNIST dataset and CIFAR-10, we show classification performance often much better than using standard selection/hyperparameter optimization methods. We hope that our approach will help non-expert users to more effectively identify machine learning algorithms and hyperparameter settings appropriate to their applications, and hence to achieve improved performance.Comment: 9 pages, 3 figure

arXiv.org e-Print Archive

CiteSeerX

SHADHO: Massively Scalable Hardware-Aware Distributed Hyperparameter Optimization

Author: Kinnison Jeff
Kremer-Herman Nathaniel
Scheirer Walter
Thain Douglas
Publication venue
Publication date: 22/01/2018
Field of study

Computer vision is experiencing an AI renaissance, in which machine learning models are expediting important breakthroughs in academic research and commercial applications. Effectively training these models, however, is not trivial due in part to hyperparameters: user-configured values that control a model's ability to learn from data. Existing hyperparameter optimization methods are highly parallel but make no effort to balance the search across heterogeneous hardware or to prioritize searching high-impact spaces. In this paper, we introduce a framework for massively Scalable Hardware-Aware Distributed Hyperparameter Optimization (SHADHO). Our framework calculates the relative complexity of each search space and monitors performance on the learning task over all trials. These metrics are then used as heuristics to assign hyperparameters to distributed workers based on their hardware. We first demonstrate that our framework achieves double the throughput of a standard distributed hyperparameter optimization framework by optimizing SVM for MNIST using 150 distributed workers. We then conduct model search with SHADHO over the course of one week using 74 GPUs across two compute clusters to optimize U-Net for a cell segmentation task, discovering 515 models that achieve a lower validation loss than standard U-Net.Comment: 10 pages, 6 figure

arXiv.org e-Print Archive

Crossref

Random Search Plus: A more effective random search for machine learning hyperparameters optimization

Author: Li Bohan
Publication venue: TRACE: Tennessee Research and Creative Exchange
Publication date: 01/12/2020
Field of study

Machine learning hyperparameter optimization has always been the key to improve model performance. There are many methods of hyperparameter optimization. The popular methods include grid search, random search, manual search, Bayesian optimization, population-based optimization, etc. Random search occupies less computations than the grid search, but at the same time there is a penalty for accuracy. However, this paper proposes a more effective random search method based on the traditional random search and hyperparameter space separation. This method is named random search plus. This thesis empirically proves that random search plus is more effective than random search. There are some case studies to do a comparison between them, which consists of four different machine learning algorithms including K-NN, K-means, Neural Networks and Support Vector Machine as optimization objects with three different size datasets including Iris flower, Pima Indians diabetes and MNIST handwritten dataset. Compared to traditional random search, random search plus can find a better hyperparameters or do an equivalent optimization as random search but with less time at most cases. With a certain hyperparameter space separation strategy, it can only need 10% time of random search to do an equivalent optimization or it can increase both the accuracy of supervised leanings and the silhouette coefficient of a supervised learning by 5%-30% in a same runtime as random search. The distribution of the best hyperparameters searched by the two methods in the hyperparameters space shows that random search plus is more global than random search. The thesis also discusses about some future works like the feasibility of using genetic algorithm to improve the local optimization ability of random search plus, space division of non-integer hyperparameters, etc

University of Tennessee, Knoxville: Trace

Learning Multiple Defaults for Machine Learning Algorithms

Author: Bischl Bernd
Müller Andreas
Pfisterer Florian
Probst Philipp
van Rijn Jan N.
Publication venue
Publication date: 01/01/2021
Field of study

The performance of modern machine learning methods highly depends on their hyperparameter configurations. One simple way of selecting a configuration is to use default settings, often proposed along with the publication and implementation of a new algorithm. Those default values are usually chosen in an ad-hoc manner to work good enough on a wide variety of datasets. To address this problem, different automatic hyperparameter configuration algorithms have been proposed, which select an optimal configuration per dataset. This principled approach usually improves performance, but adds additional algorithmic complexity and computational costs to the training procedure. As an alternative to this, we propose learning a set of complementary default values from a large database of prior empirical results. Selecting an appropriate configuration on a new dataset then requires only a simple, efficient and embarrassingly parallel search over this set. We demonstrate the effectiveness and efficiency of the approach we propose in comparison to random search and Bayesian Optimization

arXiv.org e-Print Archive

Leiden University Scholary Publications