166 research outputs found
Evolution of Scikit-Learn Pipelines with Dynamic Structured Grammatical Evolution
The deployment of Machine Learning (ML) models is a difficult and
time-consuming job that comprises a series of sequential and correlated tasks
that go from the data pre-processing, and the design and extraction of
features, to the choice of the ML algorithm and its parameterisation. The task
is even more challenging considering that the design of features is in many
cases problem specific, and thus requires domain-expertise. To overcome these
limitations Automated Machine Learning (AutoML) methods seek to automate, with
few or no human-intervention, the design of pipelines, i.e., automate the
selection of the sequence of methods that have to be applied to the raw data.
These methods have the potential to enable non-expert users to use ML, and
provide expert users with solutions that they would unlikely consider. In
particular, this paper describes AutoML-DSGE - a novel grammar-based framework
that adapts Dynamic Structured Grammatical Evolution (DSGE) to the evolution of
Scikit-Learn classification pipelines. The experimental results include
comparing AutoML-DSGE to another grammar-based AutoML framework, Resilient
ClassificationPipeline Evolution (RECIPE), and show that the average
performance of the classification pipelines generated by AutoML-DSGE is always
superior to the average performance of RECIPE; the differences are
statistically significant in 3 out of the 10 used datasets.Comment: EvoApps 202
The Technological Emergence of AutoML: A Survey of Performant Software and Applications in the Context of Industry
With most technical fields, there exists a delay between fundamental academic
research and practical industrial uptake. Whilst some sciences have robust and
well-established processes for commercialisation, such as the pharmaceutical
practice of regimented drug trials, other fields face transitory periods in
which fundamental academic advancements diffuse gradually into the space of
commerce and industry. For the still relatively young field of
Automated/Autonomous Machine Learning (AutoML/AutonoML), that transitory period
is under way, spurred on by a burgeoning interest from broader society. Yet, to
date, little research has been undertaken to assess the current state of this
dissemination and its uptake. Thus, this review makes two primary contributions
to knowledge around this topic. Firstly, it provides the most up-to-date and
comprehensive survey of existing AutoML tools, both open-source and commercial.
Secondly, it motivates and outlines a framework for assessing whether an AutoML
solution designed for real-world application is 'performant'; this framework
extends beyond the limitations of typical academic criteria, considering a
variety of stakeholder needs and the human-computer interactions required to
service them. Thus, additionally supported by an extensive assessment and
comparison of academic and commercial case-studies, this review evaluates
mainstream engagement with AutoML in the early 2020s, identifying obstacles and
opportunities for accelerating future uptake
Automatic machine learning:methods, systems, challenges
This open access book presents the first comprehensive overview of general methods in Automatic Machine Learning (AutoML), collects descriptions of existing systems based on these methods, and discusses the first international challenge of AutoML systems. The book serves as a point of entry into this quickly-developing field for researchers and advanced students alike, as well as providing a reference for practitioners aiming to use AutoML in their work. The recent success of commercial ML applications and the rapid growth of the field has created a high demand for off-the-shelf ML methods that can be used easily and without expert knowledge. Many of the recent machine learning successes crucially rely on human experts, who select appropriate ML architectures (deep learning architectures or more traditional ML workflows) and their hyperparameters; however the field of AutoML targets a progressive automation of machine learning, based on principles from optimization and machine learning itself
An Evolutionary Optimization Algorithm for Automated Classical Machine Learning
Machine learning is an evolving branch of computational algorithms that allow computers to learn from experiences, make predictions, and solve different problems without being explicitly programmed. However, building a useful machine learning model is a challenging process, requiring human expertise to perform various proper tasks and ensure that the machine learning\u27s primary objective --determining the best and most predictive model-- is achieved. These tasks include pre-processing, feature selection, and model selection. Many machine learning models developed by experts are designed manually and by trial and error. In other words, even experts need the time and resources to create good predictive machine learning models. The idea of automated machine learning (AutoML) is to automate a machine learning pipeline to release the burden of substantial development costs and manual processes. The algorithms leveraged in these systems have different hyper-parameters. On the other hand, different input datasets have various features. In both cases, the final performance of the model is closely related to the final selected configuration of features and hyper-parameters. That is why they are considered as crucial tasks in the AutoML. The challenges regarding the computationally expensive nature of tuning hyper-parameters and optimally selecting features create significant opportunities for filling the research gaps in the AutoML field. This dissertation explores how to select the features and tune the hyper-parameters of conventional machine learning algorithms efficiently and automatically. To address the challenges in the AutoML area, novel algorithms for hyper-parameter tuning and feature selection are proposed. The hyper-parameter tuning algorithm aims to provide the optimal set of hyper-parameters in three conventional machine learning models (Random Forest, XGBoost and Support Vector Machine) to obtain best scores regarding performance. On the other hand, the feature selection algorithm looks for the optimal subset of features to achieve the highest performance. Afterward, a hybrid framework is designed for both hyper-parameter tuning and feature selection. The proposed framework can discover close to the optimal configuration of features and hyper-parameters. The proposed framework includes the following components: (1) an automatic feature selection component based on artificial bee colony algorithms and machine learning training, and (2) an automatic hyper-parameter tuning component based on artificial bee colony algorithms and machine learning training for faster training and convergence of the learning models. The whole framework has been evaluated using four real-world datasets in different applications. This framework is an attempt to alleviate the challenges of hyper-parameter tuning and feature selection by using efficient algorithms. However, distributed processing, distributed learning, parallel computing, and other big data solutions are not taken into consideration in this framework
- …