33 research outputs found
Adapting Multicomponent Predictive Systems using Hybrid Adaptation Strategies with Auto-WEKA in Process Industry
Automation of composition and optimisation of multicomponent predictive systems (MCPSs) made of a number of preprocessing steps and predictive models is a challenging problem that has been addressed in recent works. However, one of the current challenges is how to adapt these systems in dynamic environments where data is changing over time. In this work we propose a hybrid approach combining different adaptation strategies with the Bayesian optimisation techniques for parametric, structural and hyperparameter optimisation of entire MCPSs. Experiments comparing different adaptation strategies have been performed on 7 datasets from real chemical production processes. Experimental analysis shows that optimisation of entire MCPSs as a method of adaptation to changing environments is feasible and that hybrid strategies perform better in most of the analysed cases
S3Mining: A model-driven engineering approach for supporting novice data miners in selecting suitable classifiers
Data mining has proven to be very useful in order to extract information from data in many different contexts. However, due to the complexity of data mining techniques, it is required the know-how of an expert in this field to select and use them. Actually, adequately applying data mining is out of the reach of novice users which have expertise in their area of work, but lack skills to employ these techniques. In this paper, we use both model-driven engineering and scientific workflow standards and tools in order to develop named S3Mining framework, which supports novice users in the process of selecting the data mining classification algorithm that better fits with their data and goal. To this aim, this selection process uses the past experiences of expert data miners with the application of classification techniques over their own datasets. The contributions of our S3Mining framework are as follows: (i) an approach to create a knowledge base which stores the past experiences of experts users, (ii) a process that provides the expert users with utilities for the construction of classifiers? recommenders based on the existing knowledge base, (iii) a system that allows novice data miners to use these recommenders for discovering the classifiers that better fit for solving their problem at hand, and (iv) a public implementation of the framework?s workflows. Finally, an experimental evaluation has been conducted to shown the feasibility of our framework
Grammar-based evolutionary approach for automated workflow composition with domain-specific operators and ensemble diversity
The process of extracting valuable and novel insights from raw data involves
a series of complex steps. In the realm of Automated Machine Learning (AutoML),
a significant research focus is on automating aspects of this process,
specifically tasks like selecting algorithms and optimising their
hyper-parameters. A particularly challenging task in AutoML is automatic
workflow composition (AWC). AWC aims to identify the most effective sequence of
data preprocessing and ML algorithms, coupled with their best hyper-parameters,
for a specific dataset. However, existing AWC methods are limited in how many
and in what ways they can combine algorithms within a workflow.
Addressing this gap, this paper introduces EvoFlow, a grammar-based
evolutionary approach for AWC. EvoFlow enhances the flexibility in designing
workflow structures, empowering practitioners to select algorithms that best
fit their specific requirements. EvoFlow stands out by integrating two
innovative features. First, it employs a suite of genetic operators, designed
specifically for AWC, to optimise both the structure of workflows and their
hyper-parameters. Second, it implements a novel updating mechanism that
enriches the variety of predictions made by different workflows. Promoting this
diversity helps prevent the algorithm from overfitting. With this aim, EvoFlow
builds an ensemble whose workflows differ in their misclassified instances.
To evaluate EvoFlow's effectiveness, we carried out empirical validation
using a set of classification benchmarks. We begin with an ablation study to
demonstrate the enhanced performance attributable to EvoFlow's unique
components. Then, we compare EvoFlow with other AWC approaches, encompassing
both evolutionary and non-evolutionary techniques. Our findings show that
EvoFlow's specialised genetic operators and updating mechanism substantially
outperform current leading methods[..]Comment: 32 pages, 7 figures, 6 tables, journal pape
Metalearning
This open access book as one of the fastest-growing areas of research in machine learning, metalearning studies principled methods to obtain efficient models and solutions by adapting machine learning and data mining processes. This adaptation usually exploits information from past experience on other tasks and the adaptive processes can involve machine learning approaches. As a related area to metalearning and a hot topic currently, automated machine learning (AutoML) is concerned with automating the machine learning processes. Metalearning and AutoML can help AI learn to control the application of different learning methods and acquire new solutions faster without unnecessary interventions from the user. This book offers a comprehensive and thorough introduction to almost all aspects of metalearning and AutoML, covering the basic concepts and architecture, evaluation, datasets, hyperparameter optimization, ensembles and workflows, and also how this knowledge can be used to select, combine, compose, adapt and configure both algorithms and models to yield faster and better solutions to data mining and data science problems. It can thus help developers to develop systems that can improve themselves through experience. This book is a substantial update of the first edition published in 2009. It includes 18 chapters, more than twice as much as the previous version. This enabled the authors to cover the most relevant topics in more depth and incorporate the overview of recent research in the respective area. The book will be of interest to researchers and graduate students in the areas of machine learning, data mining, data science and artificial intelligence. ; Metalearning is the study of principled methods that exploit metaknowledge to obtain efficient models and solutions by adapting machine learning and data mining processes. While the variety of machine learning and data mining techniques now available can, in principle, provide good model solutions, a methodology is still needed to guide the search for the most appropriate model in an efficient way. Metalearning provides one such methodology that allows systems to become more effective through experience. This book discusses several approaches to obtaining knowledge concerning the performance of machine learning and data mining algorithms. It shows how this knowledge can be reused to select, combine, compose and adapt both algorithms and models to yield faster, more effective solutions to data mining problems. It can thus help developers improve their algorithms and also develop learning systems that can improve themselves. The book will be of interest to researchers and graduate students in the areas of machine learning, data mining and artificial intelligence
Metalearning
This open access book as one of the fastest-growing areas of research in machine learning, metalearning studies principled methods to obtain efficient models and solutions by adapting machine learning and data mining processes. This adaptation usually exploits information from past experience on other tasks and the adaptive processes can involve machine learning approaches. As a related area to metalearning and a hot topic currently, automated machine learning (AutoML) is concerned with automating the machine learning processes. Metalearning and AutoML can help AI learn to control the application of different learning methods and acquire new solutions faster without unnecessary interventions from the user. This book offers a comprehensive and thorough introduction to almost all aspects of metalearning and AutoML, covering the basic concepts and architecture, evaluation, datasets, hyperparameter optimization, ensembles and workflows, and also how this knowledge can be used to select, combine, compose, adapt and configure both algorithms and models to yield faster and better solutions to data mining and data science problems. It can thus help developers to develop systems that can improve themselves through experience. This book is a substantial update of the first edition published in 2009. It includes 18 chapters, more than twice as much as the previous version. This enabled the authors to cover the most relevant topics in more depth and incorporate the overview of recent research in the respective area. The book will be of interest to researchers and graduate students in the areas of machine learning, data mining, data science and artificial intelligence. ; Metalearning is the study of principled methods that exploit metaknowledge to obtain efficient models and solutions by adapting machine learning and data mining processes. While the variety of machine learning and data mining techniques now available can, in principle, provide good model solutions, a methodology is still needed to guide the search for the most appropriate model in an efficient way. Metalearning provides one such methodology that allows systems to become more effective through experience. This book discusses several approaches to obtaining knowledge concerning the performance of machine learning and data mining algorithms. It shows how this knowledge can be reused to select, combine, compose and adapt both algorithms and models to yield faster, more effective solutions to data mining problems. It can thus help developers improve their algorithms and also develop learning systems that can improve themselves. The book will be of interest to researchers and graduate students in the areas of machine learning, data mining and artificial intelligence
Automated Machine Learning for Multi-Label Classification
Automated machine learning (AutoML) aims to select and configure machine
learning algorithms and combine them into machine learning pipelines tailored
to a dataset at hand. For supervised learning tasks, most notably binary and
multinomial classification, aka single-label classification (SLC), such AutoML
approaches have shown promising results. However, the task of multi-label
classification (MLC), where data points are associated with a set of class
labels instead of a single class label, has received much less attention so
far. In the context of multi-label classification, the data-specific selection
and configuration of multi-label classifiers are challenging even for experts
in the field, as it is a high-dimensional optimization problem with multi-level
hierarchical dependencies. While for SLC, the space of machine learning
pipelines is already huge, the size of the MLC search space outnumbers the one
of SLC by several orders.
In the first part of this thesis, we devise a novel AutoML approach for
single-label classification tasks optimizing pipelines of machine learning
algorithms, consisting of two algorithms at most. This approach is then
extended first to optimize pipelines of unlimited length and eventually
configure the complex hierarchical structures of multi-label classification
methods. Furthermore, we investigate how well AutoML approaches that form the
state of the art for single-label classification tasks scale with the increased
problem complexity of AutoML for multi-label classification.
In the second part, we explore how methods for SLC and MLC could be
configured more flexibly to achieve better generalization performance and how
to increase the efficiency of execution-based AutoML systems