115 research outputs found
Learning from the past with experiment databases
Thousands of Machine Learning research papers contain experimental comparisons that usually have been conducted with a single focus of interest, and detailed results are usually lost after publication. Once past experiments are collected in experiment databases they allow for additional and possibly much broader investigation. In this paper, we show how to use such a repository to answer various interesting research questions about learning algorithms and to verify a number of recent studies. Alongside performing elaborate comparisons and rankings of algorithms, we also investigate the effects of algorithm parameters and data properties, and study the learning curves and bias-variance profiles of algorithms to gain deeper insights into their behavior
Automated data pre-processing via meta-learning
The final publication is available at link.springer.comA data mining algorithm may perform differently on datasets with different characteristics, e.g., it might perform better on a dataset with continuous attributes rather than with categorical attributes, or the other way around.
As a matter of fact, a dataset usually needs to be pre-processed. Taking into account all the possible pre-processing operators, there exists a staggeringly large number of alternatives and nonexperienced users become overwhelmed.
We show that this problem can be addressed by an automated approach, leveraging ideas from metalearning.
Specifically, we consider a wide range of data pre-processing techniques and a set of data mining algorithms. For each data mining algorithm and selected dataset, we are able to predict the transformations that improve the result
of the algorithm on the respective dataset. Our approach will help non-expert users to more effectively identify the transformations appropriate to their applications, and hence to achieve improved results.Peer ReviewedPostprint (published version
A Community-Based Platform for Machine Learning Experimentation
We demonstrate the practical uses of a community-based platform for the sharing and in-depth investigation of the thousands of machine learning experiments executed every day. It is aimed at researchers and practitioners of data mining techniques, and is publicly available at http://expdb.cs.kuleuven.be. The system offers standards and API’s for sharing experimental results, extensive querying capabilities of the gathered results and allows easy integration in existing data mining toolboxes. We believe such a system may speed up scientific discovery and enhance the scientific rigor of machine learning research.status: publishe
Master your Metrics with Calibration
Machine learning models deployed in real-world applications are often
evaluated with precision-based metrics such as F1-score or AUC-PR (Area Under
the Curve of Precision Recall). Heavily dependent on the class prior, such
metrics make it difficult to interpret the variation of a model's performance
over different subpopulations/subperiods in a dataset. In this paper, we
propose a way to calibrate the metrics so that they can be made invariant to
the prior. We conduct a large number of experiments on balanced and imbalanced
data to assess the behavior of calibrated metrics and show that they improve
interpretability and provide a better control over what is really measured. We
describe specific real-world use-cases where calibration is beneficial such as,
for instance, model monitoring in production, reporting, or fairness
evaluation.Comment: Presented at IDA202
Case Study on Bagging Stable Classifiers for Data Streams
Algorithms and the Foundations of Software technolog
The online performance estimation framework: heterogeneous ensemble learning for data streams
Algorithms and the Foundations of Software technolog
Decoding machine learning benchmarks
Despite the availability of benchmark machine learning (ML) repositories
(e.g., UCI, OpenML), there is no standard evaluation strategy yet capable of
pointing out which is the best set of datasets to serve as gold standard to
test different ML algorithms. In recent studies, Item Response Theory (IRT) has
emerged as a new approach to elucidate what should be a good ML benchmark. This
work applied IRT to explore the well-known OpenML-CC18 benchmark to identify
how suitable it is on the evaluation of classifiers. Several classifiers
ranging from classical to ensembles ones were evaluated using IRT models, which
could simultaneously estimate dataset difficulty and classifiers' ability. The
Glicko-2 rating system was applied on the top of IRT to summarize the innate
ability and aptitude of classifiers. It was observed that not all datasets from
OpenML-CC18 are really useful to evaluate classifiers. Most datasets evaluated
in this work (84%) contain easy instances in general (e.g., around 10% of
difficult instances only). Also, 80% of the instances in half of this benchmark
are very discriminating ones, which can be of great use for pairwise algorithm
comparison, but not useful to push classifiers abilities. This paper presents
this new evaluation methodology based on IRT as well as the tool decodIRT,
developed to guide IRT estimation over ML benchmarks.Comment: Paper published at the BRACIS 2020 conference, 15 pages, 4 figure
Meta-learning for symbolic hyperparameter defaults
Computer Systems, Imagery and Medi
Advances in MetaDL: AAAI 2021 Challenge and Workshop
Algorithms and the Foundations of Software technolog
- …