9 research outputs found
Recommended from our members
Automated Synthesis of Metal-Organic Frameworks using Graph Grammars and Monte Carlo Tree Search
In this work, a fully automated approach to synthesis Metal-Organic Frameworks is presented. We use graph to represent the structure of Metal-organic Frameworks and use graph grammars which are backbone rules and functional group rules to generate metal-organic framework. For a given parameter value of the user defined metal-organic framework, the design space is searched to find the candidate metal-organic framework which has very close parameter value by using Monte Carlo Tree Search.
To test the effectiveness of Monte Carlo Tree Search, we choose random search as baseline to compare with Monte Carlo Tree Search. The results from using three different evaluation function prove the superior performance of Monte Carlo Tree Search
Recommended from our members
Informative Hyper-parameter Optimization and Selection
Hyper-parameter optimization methods allow efficient and robust hyperparameter search-ing without the need to hand-select each value and combination. Although hyper-parameter tuners, such as BOHB, Hyperopt, and SMAC have been investigated by researchers in terms of performance, there has yet to be an in-depth analysis of the values each tuner selected over alliterations. We propose a thorough aggregation of data in terms of the efficiency of the search values selected by each tuner over 59 datasets and ten popular ML algorithms from Scikit-learn. From this extensive data accumulated, we observe and advise which tuners show better results for particular datasets, through its meta-data, and algorithms. Through this research, we have also developed a simple plug-in for BOHB, Hyperopt, and SMAC into DARPA’s Data-driven discovery(D3M) Auto-ML systems for smooth implementation of various tuners. This is advantageous as the desired hyper-parameter tuner may change depending on the pipeline search method in anAuto-ML system, particularly when compared with Auto-ML systems that only utilize one search method. Our results show that for Auto-ML systems, the Hyperopt tuner will give more desirable results in a fewer amount of iterations due to the significant exploration component, and BOHB performs the best generally over a large number of datasets and algorithms owing to strategic budgeting
Auto-Sklearn 2.0: Hands-free AutoML via Meta-Learning
Automated Machine Learning (AutoML) supports practitioners and researchers with the tedious task of designing machine learning pipelines and has recently achieved substantial success. In this paper, we introduce new AutoML approaches motivated by our winning submission to the second ChaLearn AutoML challenge. We develop PoSH Auto-sklearn, which enables AutoML systems to work well on large datasets under rigid time limits by using a new, simple and meta-feature-free meta-learning technique and by employing a successful bandit strategy for budget allocation. However, PoSH Auto-sklearn introduces even more ways of running AutoML and might make it harder for users to set it up correctly. Therefore, we also go one step further and study the design space of AutoML itself, proposing a solution towards truly hands-free AutoML. Together, these changes give rise to the next generation of our AutoML system, Auto-sklearn 2.0. We verify the improvements by these additions in an extensive experimental study on 39 AutoML benchmark datasets. We conclude the paper by comparing to other popular AutoML frameworks and Auto-sklearn 1.0, reducing the relative error by up to a factor of 4.5, and yielding a performance in 10 minutes that is substantially better than what Auto-sklearn 1.0 achieves within an hour
Automated Machine Learning Systems: Evaluation, Ease of Use, Data Transformation
Machine Learning (ML) has been rapidly progressing through the years due to its versatility to solve different problems. As a result, many machine learning frameworks have been created that are a collection of different algorithms. However, data scientists often struggle to determine which one to use to develop their ML solutions due to many options. For this reason, many Automated Machine Learning (AutoML) systems have been created to help users create ML solutions easily by defining the problem they want to solve, providing the data, and setting a budget such as time search for a solution.
Through the years, many AutoML systems have been developed, and their applications range from solving simple tasks such as tabular classification to more complex such as object detection. Due to many AutoML systems, it becomes challenging for users to determine which one suits them the best because most of the systems focus on specific tasks and data, and sometimes they overlap on the tasks they can solve. Another issue that users need to be aware of is that although most of the search process is already automated, it is necessary for the user to get involved in data preprocessing in most systems. Such preprocessing can be trivial, from selecting images files to more challenging tasks such as merging multiple database tables.
Another aspect of AutoML systems is the research involved in the development. AutoML research can focus on the way of searching, to some more deep processes such as optimizing how models are run. This research leads to the creation of many AutoML systems every year. Creating an AutoML system is challenging since there are many things to consider, from the design to the implementation, and attempting to use an existing system to test a new hypothesis becomes challenging. The challenge of reusing an existing AutoML system is that most systems were designed towards proposing some research improvement rather than usability. Another problem is that these systems are not maintained, and if they are maintained, it is difficult to use them due to the lack of documentation.
Enabling the advance on AutoML is challenging. Firstly, it is necessary to standardize compo-nents, so there is a more efficient way to compare different frameworks. There are many AutoML systems every year with new search strategies. However, it becomes challenging to objectively compare them since they could be improving in other areas rather than the ones claimed. Secondly, creating an AutoML system should not be challenging since it can stop many researchers from contributing to the field. Creating a new AutoML system with the sole purpose of testing a new component should not be difficult. Thirdly: we identify that state-of-the-art AutoML systems only focus on model selection and hyperparameter tuning while leaving room for improvement on the data preprocessing. To tackle the challenges above, several contributions are made in the preliminary work, and future work is proposed to conclude the dissertation:
• The first contribution of this research dissertation is the development of standards for AutoML, which generalize components that have been used for a while and give them proper definitions.
• Second, we propose a better methodology for the evaluation of AutoML systems that provide a better understanding of the capabilities of different systems
• To alleviate the burden of human efforts to create single-use AutoML systems, we propose an extendible to enable AutoML. The proposed AutoML frameworks enable customizable AutoML solutions without designing and implementing every aspect of an AutoML system.
• Considering the potential benefit of data preprocessing search for AutoML in the last piece of work, we focus on the preprocessing search by creating an end-to-end framework that takes advantage on contextual feature similarities
Recommended from our members
Towards Automatic Machine Learning Pipeline Design
The rapid increase in the amount of data collected is quickly shifting the bottleneck of making informed decisions from a lack of data to a lack of data scientists to help analyze the collected data. Moreover, the publishing rate of new potential solutions and approaches for data analysis has surpassed what a human data scientist can follow. At the same time, we observe that many tasks a data scientist performs during analysis could be automated. Automatic machine learning (AutoML) research and solutions attempt to automate portions or even the entire data analysis process.We address two challenges in AutoML research: first, how to represent ML programs suitably for metalearning; and second, how to improve evaluations of AutoML systems to be able to compare approaches, not just predictions.To this end, we have designed and implemented a framework for ML programs which provides all the components needed to describe ML programs in a standard way. The framework is extensible and framework’s components are decoupled from each other, e.g., the framework can be used to describe ML programs which use neural networks. We provide reference tooling for execution of programs described in the framework. We have also designed and implemented a service, a metalearning database, that stores information about executed ML programs generated by different AutoML systems.We evaluate our framework by measuring the computational overhead of using the framework as compared to executing ML programs which directly call underlying libraries. We observe that the framework’s ML program execution time is an order of magnitude slower and its memory usage is twice that of ML programs which do not use this framework.We demonstrate our framework’s ability to evaluate AutoML systems by comparing 10 different AutoML systems that use our framework. The results show that the framework can be used both to describe a diverse set of ML programs and to determine unambiguously which AutoML system produced the best ML programs. In many cases, the produced ML programs outperformed ML programs made by human experts
Automated Machine Learning with Monte-Carlo Tree Search
International audienceThe AutoML task consists of selecting the proper algorithm in a machine learning portfolio, and its hyperparameter values, in order to deliver the best performance on the dataset at hand. MOSAIC, a Monte-Carlo tree search (MCTS) based approach, is presented to handle the AutoML hybrid structural and parametric expensive black-box optimization problem. Extensive empirical studies are conducted to independently assess and compare: i) the optimization processes based on Bayesian optimization or MCTS; ii) its warm-start initializa-tion; iii) the ensembling of the solutions gathered along the search. MOSAIC is assessed on the OpenML 100 benchmark and the Scikit-learn portfolio, with statistically significant gains over AUTO-SKLEARN, winner of former international AutoML challenges