38 research outputs found
Automatic Machine Learning by Pipeline Synthesis using Model-Based Reinforcement Learning and a Grammar
Automatic machine learning is an important problem in the forefront of
machine learning. The strongest AutoML systems are based on neural networks,
evolutionary algorithms, and Bayesian optimization. Recently AlphaD3M reached
state-of-the-art results with an order of magnitude speedup using reinforcement
learning with self-play. In this work we extend AlphaD3M by using a pipeline
grammar and a pre-trained model which generalizes from many different datasets
and similar tasks. Our results demonstrate improved performance compared with
our earlier work and existing methods on AutoML benchmark datasets for
classification and regression tasks. In the spirit of reproducible research we
make our data, models, and code publicly available.Comment: ICML Workshop on Automated Machine Learnin
A Comparison of AutoML Tools for Machine Learning, Deep Learning and XGBoost
This paper presents a benchmark of supervised Automated Machine Learning (AutoML) tools. Firstly, we an- alyze the characteristics of eight recent open-source AutoML tools (Auto-Keras, Auto-PyTorch, Auto-Sklearn, AutoGluon, H2O AutoML, rminer, TPOT and TransmogrifAI) and describe twelve popular OpenML datasets that were used in the benchmark (divided into regression, binary and multi-class classification tasks). Then, we perform a comparison study with hundreds of computational experiments based on three scenarios: General Machine Learning (GML), Deep Learning (DL) and XGBoost (XGB). To select the best tool, we used a lexicographic approach, considering first the average prediction score for each task and then the computational effort. The best predictive results were achieved for GML, which were further compared with the best OpenML public results. Overall, the best GML AutoML tools obtained competitive results, outperforming the best OpenML models in five datasets. These results confirm the potential of the general-purpose AutoML tools to fully automate the Machine Learning (ML) algorithm selection and tuning.Opti-Edge: 5G
Digital Services Optimization at the Edge, Individual Project,
NUP: POCI-01-0247-FEDER-045220, co-funded by the Incentive System for Research and Technological Development,
from the Thematic Operational Program Competitiveness of
the national framework program - Portugal202
Machine Learning in SME: An Empirical Study on Enablers and Success Factors
Machine learning (ML) techniques are rapidly evolving, both in academia and practice. However, enterprises show different maturity levels in successfully implementing ML techniques. Thus, we review the state of adoption of ML in enterprises. We find that ML technologies are being increasingly adopted in enterprises, but that small and medium-size enterprises (SME) are struggling with the introduction in comparison to larger enterprises. In order to identify enablers and success factors we conduct a qualitative empirical study with 18 companies in different industries. The results show that especially SME fail to apply ML technologies due to insufficient ML knowhow. However, partners and appropriate tools can compensate this lack of resources. We discuss approaches to bridge the gap for SME
ReConTab: Regularized Contrastive Representation Learning for Tabular Data
Representation learning stands as one of the critical machine learning
techniques across various domains. Through the acquisition of high-quality
features, pre-trained embeddings significantly reduce input space redundancy,
benefiting downstream pattern recognition tasks such as classification,
regression, or detection. Nonetheless, in the domain of tabular data, feature
engineering and selection still heavily rely on manual intervention, leading to
time-consuming processes and necessitating domain expertise. In response to
this challenge, we introduce ReConTab, a deep automatic representation learning
framework with regularized contrastive learning. Agnostic to any type of
modeling task, ReConTab constructs an asymmetric autoencoder based on the same
raw features from model inputs, producing low-dimensional representative
embeddings. Specifically, regularization techniques are applied for raw feature
selection. Meanwhile, ReConTab leverages contrastive learning to distill the
most pertinent information for downstream tasks. Experiments conducted on
extensive real-world datasets substantiate the framework's capacity to yield
substantial and robust performance improvements. Furthermore, we empirically
demonstrate that pre-trained embeddings can seamlessly integrate as easily
adaptable features, enhancing the performance of various traditional methods
such as XGBoost and Random Forest
Judging competitions and benchmarks: a candidate election approach
International audienceMachine learning progress relies on algorithm benchmarks. We study the problem of declaring a winner, or ranking "candidate" algorithms, based on results obtained by "judges" (scores on various tasks). Inspired by social science and game theory on fair elections, we compare various ranking functions, ranging from simple score averaging to Condorcet methods. We devise novel empirical criteria to assess the quality of ranking functions, including the generalization to new tasks and the stability under judge or candidate perturbation. We conduct an empirical comparison on the results of 5 competitions and benchmarks (one artificially generated). While prior theoretical analyses indicate that no single ranking function satisfies all desired properties, our empirical study reveals that the classical "average rank" method fares well. However, some pairwise comparison methods can get better empirical results
Improving generalisation of AutoML systems with dynamic fitness evaluations
A common problem machine learning developers are faced with is overfitting,
that is, fitting a pipeline too closely to the training data that the
performance degrades for unseen data. Automated machine learning aims to free
(or at least ease) the developer from the burden of pipeline creation, but this
overfitting problem can persist. In fact, this can become more of a problem as
we look to iteratively optimise the performance of an internal cross-validation
(most often \textit{k}-fold). While this internal cross-validation hopes to
reduce this overfitting, we show we can still risk overfitting to the
particular folds used. In this work, we aim to remedy this problem by
introducing dynamic fitness evaluations which approximate repeated
\textit{k}-fold cross-validation, at little extra cost over single
\textit{k}-fold, and far lower cost than typical repeated \textit{k}-fold. The
results show that when time equated, the proposed fitness function results in
significant improvement over the current state-of-the-art baseline method which
uses an internal single \textit{k}-fold. Furthermore, the proposed extension is
very simple to implement on top of existing evolutionary computation methods,
and can provide essentially a free boost in generalisation/testing performance.Comment: 19 pages, 4 figure