Search CORE

8 research outputs found

PCTBagging: From inner ensembles to ensembles. A trade-off between discriminating capacity and interpretability

Author: Arbelaiz Gallego Olatz
Ibarguren Arrieta Igor
Muguerza Rivero Javier Francisco
Pérez de la Fuente Jesús María
Yera Gil Ainhoa
Publication venue: 'Elsevier BV'
Publication date: 01/01/2022
Field of study

[EN] The use of decision trees considerably improves the discriminating capacity of ensemble classifiers. However, this process results in the classifiers no longer being interpretable, although comprehensibility is a desired trait of decision trees. Consolidation (consolidated tree construction algorithm, CTC) was introduced to improve the discriminating capacity of decision trees, whereby a set of samples is used to build the consolidated tree without sacrificing transparency. In this work, PCTBagging is presented as a hybrid approach between bagging and a consolidated tree such that part of the comprehensibility of the consolidated tree is maintained while also improving the discriminating capacity. The consolidated tree is first developed up to a certain point and then typical bagging is performed for each sample. The part of the consolidated tree to be initially developed is configured by setting a consolidation percentage. In this work, 11 different consolidation percentages are considered for PCTBagging to effectively analyse the trade-off between comprehensibility and discriminating capacity. The results of PCTBagging are compared to those of bagging, CTC and C4.5, which serves as the base for all other algorithms. PCTBagging, with a low consolidation percentage, achieves a discriminating capacity similar to that of bagging while maintaining part of the interpretable structure of the consolidated tree. PCTBagging with a consolidation percentage of 100% offers the same comprehensibility as CTC, but achieves a significantly greater discriminating capacity.This work was funded by the Department of Education, Universities and Research of the Basque Government (ADIAN, IT980-16); and by the Ministry of Economy and Competitiveness of the Spanish Government and the European Regional Development Fund -ERDF (PhysComp, TIN2017-85409-P). We would also like to thank our former undergraduate student Ander Otsoa de Alda, who participated in the implementation of the PCTBagging algorithm for the WEKA platform

Archivo Digital para la Docencia y la Investigación

A Comparative Evaluation of Quantification Methods

Author: Lemmerich Florian
Schumacher Tobias
Strohmaier Markus
Publication venue
Publication date: 04/03/2021
Field of study

Quantification represents the problem of predicting class distributions in a given target set. It also represents a growing research field in supervised machine learning, for which a large variety of different algorithms has been proposed in recent years. However, a comprehensive empirical comparison of quantification methods that supports algorithm selection is not available yet. In this work, we close this research gap by conducting a thorough empirical performance comparison of 24 different quantification methods. To consider a broad range of different scenarios for binary as well as multiclass quantification settings, we carried out almost 3 million experimental runs on 40 data sets. We observe that no single algorithm generally outperforms all competitors, but identify a group of methods including the Median Sweep and the DyS framework that perform significantly better in binary settings. For the multiclass setting, we observe that a different, broad group of algorithms yields good performance, including the Generalized Probabilistic Adjusted Count, the readme method, the energy distance minimization method, the EM algorithm for quantification, and Friedman's method. More generally, we find that the performance on multiclass quantification is inferior to the results obtained in the binary setting. Our results can guide practitioners who intend to apply quantification algorithms and help researchers to identify opportunities for future research

arXiv.org e-Print Archive

Constant optimization and feature standardization in multiobjective genetic programming

Author: Rockett P.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 19/08/2021
Field of study

This paper extends the numerical tuning of tree constants in genetic programming (GP) to the multiobjective domain. Using ten real-world benchmark regression datasets and employing Bayesian comparison procedures, we first consider the effects of feature standardization (without constant tuning) and conclude that standardization generally produces lower test errors, but, contrary to other recently published work, we find or{blue}{a much less clear trend for} tree sizes. In addition, we consider the effects of constant tuning -- with and without feature standardization -- and observe that i) constant tuning invariably improves test error, and ii) usually decreases tree size. Combined with standardization, constant tuning produces the best test error results; tree sizes, however, are increased. We also examine the effects of applying constant tuning only once at the end a conventional GP run which turns out to be surprisingly promising. Finally, we consider the merits of using numerical procedures to tune tree constants and observe that for around half the datasets evolutionary search alone is superior whereas for the remaining half, parameter tuning is superior. We identify a number of open research questions that arise from this work

White Rose Research Online