9 research outputs found

    Auto-Sklearn 2.0: Hands-free AutoML via Meta-Learning

    Get PDF
    Automated Machine Learning (AutoML) supports practitioners and researchers with the tedious task of designing machine learning pipelines and has recently achieved substantial success. In this paper, we introduce new AutoML approaches motivated by our winning submission to the second ChaLearn AutoML challenge. We develop PoSH Auto-sklearn, which enables AutoML systems to work well on large datasets under rigid time limits by using a new, simple and meta-feature-free meta-learning technique and by employing a successful bandit strategy for budget allocation. However, PoSH Auto-sklearn introduces even more ways of running AutoML and might make it harder for users to set it up correctly. Therefore, we also go one step further and study the design space of AutoML itself, proposing a solution towards truly hands-free AutoML. Together, these changes give rise to the next generation of our AutoML system, Auto-sklearn 2.0. We verify the improvements by these additions in an extensive experimental study on 39 AutoML benchmark datasets. We conclude the paper by comparing to other popular AutoML frameworks and Auto-sklearn 1.0, reducing the relative error by up to a factor of 4.5, and yielding a performance in 10 minutes that is substantially better than what Auto-sklearn 1.0 achieves within an hour

    Automated Machine Learning Systems: Evaluation, Ease of Use, Data Transformation

    Get PDF
    Machine Learning (ML) has been rapidly progressing through the years due to its versatility to solve different problems. As a result, many machine learning frameworks have been created that are a collection of different algorithms. However, data scientists often struggle to determine which one to use to develop their ML solutions due to many options. For this reason, many Automated Machine Learning (AutoML) systems have been created to help users create ML solutions easily by defining the problem they want to solve, providing the data, and setting a budget such as time search for a solution. Through the years, many AutoML systems have been developed, and their applications range from solving simple tasks such as tabular classification to more complex such as object detection. Due to many AutoML systems, it becomes challenging for users to determine which one suits them the best because most of the systems focus on specific tasks and data, and sometimes they overlap on the tasks they can solve. Another issue that users need to be aware of is that although most of the search process is already automated, it is necessary for the user to get involved in data preprocessing in most systems. Such preprocessing can be trivial, from selecting images files to more challenging tasks such as merging multiple database tables. Another aspect of AutoML systems is the research involved in the development. AutoML research can focus on the way of searching, to some more deep processes such as optimizing how models are run. This research leads to the creation of many AutoML systems every year. Creating an AutoML system is challenging since there are many things to consider, from the design to the implementation, and attempting to use an existing system to test a new hypothesis becomes challenging. The challenge of reusing an existing AutoML system is that most systems were designed towards proposing some research improvement rather than usability. Another problem is that these systems are not maintained, and if they are maintained, it is difficult to use them due to the lack of documentation. Enabling the advance on AutoML is challenging. Firstly, it is necessary to standardize compo-nents, so there is a more efficient way to compare different frameworks. There are many AutoML systems every year with new search strategies. However, it becomes challenging to objectively compare them since they could be improving in other areas rather than the ones claimed. Secondly, creating an AutoML system should not be challenging since it can stop many researchers from contributing to the field. Creating a new AutoML system with the sole purpose of testing a new component should not be difficult. Thirdly: we identify that state-of-the-art AutoML systems only focus on model selection and hyperparameter tuning while leaving room for improvement on the data preprocessing. To tackle the challenges above, several contributions are made in the preliminary work, and future work is proposed to conclude the dissertation: • The first contribution of this research dissertation is the development of standards for AutoML, which generalize components that have been used for a while and give them proper definitions. • Second, we propose a better methodology for the evaluation of AutoML systems that provide a better understanding of the capabilities of different systems • To alleviate the burden of human efforts to create single-use AutoML systems, we propose an extendible to enable AutoML. The proposed AutoML frameworks enable customizable AutoML solutions without designing and implementing every aspect of an AutoML system. • Considering the potential benefit of data preprocessing search for AutoML in the last piece of work, we focus on the preprocessing search by creating an end-to-end framework that takes advantage on contextual feature similarities

    Automated Machine Learning with Monte-Carlo Tree Search

    Get PDF
    International audienceThe AutoML task consists of selecting the proper algorithm in a machine learning portfolio, and its hyperparameter values, in order to deliver the best performance on the dataset at hand. MOSAIC, a Monte-Carlo tree search (MCTS) based approach, is presented to handle the AutoML hybrid structural and parametric expensive black-box optimization problem. Extensive empirical studies are conducted to independently assess and compare: i) the optimization processes based on Bayesian optimization or MCTS; ii) its warm-start initializa-tion; iii) the ensembling of the solutions gathered along the search. MOSAIC is assessed on the OpenML 100 benchmark and the Scikit-learn portfolio, with statistically significant gains over AUTO-SKLEARN, winner of former international AutoML challenges
    corecore