Search CORE

12 research outputs found

A survey of cost-sensitive decision tree induction algorithms

Author: Bradford J. P.
Elkan C.
Esmeir S.
Esmeir S.
Estruch V.
Fan W.
Ferri C.
Freund Y.
Hart A. E.
Knoll U.
Li J.
Lin F. Y.
Liu X.
Mease D.
Murthy S.
Ni A.
Norton S. W.
Pazzani M.
Quinlan J. R.
Quinlan J. R.
Schapire R. E.
Sunil Vadera
Susan Lomax
Swets J.
Tan M.
Ting K.
Ting K.
Ting K. M.
von Neumann J.
Zadrozny B.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/02/2013
Field of study

The past decade has seen a significant interest on the problem of inducing decision trees that take account of costs of misclassification and costs of acquiring the features used for decision making. This survey identifies over 50 algorithms including approaches that are direct adaptations of accuracy based methods, use genetic algorithms, use anytime methods and utilize boosting and bagging. The survey brings together these different studies and novel approaches to cost-sensitive decision tree learning, provides a useful taxonomy, a historical timeline of how the field has developed and should provide a useful reference point for future research in this field

University of Salford Institutional Repository

Crossref

A cost-sensitive decision tree learning algorithm based on a multi-armed bandit framework

Author: Auer
Esmeir
Gabillon
Murthy
Robbins
Sunil Vadera
Susan Lomax
Tan
Turney
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/07/2017
Field of study

This paper develops a new algorithm for inducing cost-sensitive decision trees that is inspired by the multi-armed bandit problem, in which a player in a casino has to decide which slot machine (bandit) from a selection of slot machines is likely to pay out the most. Game Theory proposes a solution to this multi-armed bandit problem by using a process of exploration and exploitation in which reward is maximized. This paper utilizes these concepts to develop a new algorithm by viewing the rewards as a reduction in costs, and utilizing the exploration and exploitation techniques so that a compromise between decisions based on accuracy and decisions based on costs can be found. The algorithm employs the notion of lever pulls in the multi-armed bandit game to select the attributes during decision tree induction, using a look-ahead methodology to explore potential attributes and exploit the attributes which maximizes the reward. The new algorithm is evaluated on fifteen datasets and compared to six well-known algorithms J48, EG2, MetaCost, AdaCostM1, ICET and ACT. The results obtained show that the new multi-armed based algorithm can produce more cost-effective trees without compromising accuracy. The paper also includes a critical appraisal of the limitations of the new algorithm and proposes avenues for further research

University of Salford Institutional Repository

Crossref

Cost-sensitive ensemble learning: a unifying framework

Author: Petrides George
Verbeke Wouter
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 28/09/2021
Field of study

Over the years, a plethora of cost-sensitive methods have been proposed for learning on data when different types of misclassification errors incur different costs. Our contribution is a unifying framework that provides a comprehensive and insightful overview on cost-sensitive ensemble methods, pinpointing their differences and similarities via a fine-grained categorization. Our framework contains natural extensions and generalisations of ideas across methods, be it AdaBoost, Bagging or Random Forest, and as a result not only yields all methods known to date but also some not previously considered.publishedVersio

Lirias

University of Bergen

NORA - Norwegian Open Research Archives

EBNO : evolution of cost-sensitive Bayesian networks

Author: Chickering D. M.
Demšar J.
Drummond C.
Duda R. O.
Fan W.
Lauritzen S. L.
Ling C. X.
Liu X.
Lomax S.
Mitchell T. M.
Nashnush E.
Pearl J.
Quinlan J. R.
Tan M.
Ting K. M.
Vadera S.
Publication venue: 'Wiley'
Publication date: 16/06/2020
Field of study

The last decade has seen an increase in the attention paid to the development of cost sensitive learning algorithms that aim to minimize misclassification costs while still maintaining accuracy. Most of this attention has been on cost sensitive decision tree learning, while relatively little attention has been paid to assess if it is possible to develop better cost sensitive classifiers based on Bayesian networks. Hence, this paper presents EBNO, an algorithm that utilizes Genetic Algorithms to learn cost sensitive Bayesian networks; where genes are utilized to represent the links between the nodes in Bayesian networks and the expected cost is used as a fitness function. An empirical comparison of the new algorithm has been carried out with respect to: (i) an algorithm that induces cost-insensitive Bayesian networks to provide a base line, (ii) ICET, a well-known algorithm that uses Genetic Algorithms to induce cost-sensitive decision trees, (iii) use of MetaCost to induce cost-sensitive Bayesian networks via bagging (iv) use of AdaBoost to induce cost-sensitive Bayesian networks and (v) use of XGBoost, a gradient boosting algorithm, to induce cost-sensitive decision trees. An empirical evaluation on 28 data sets reveals that EBNO performs well in comparison to the algorithms that produce single interpretable models and performs just as well as algorithms that use bagging and boosting methods

University of Salford Institutional Repository

Crossref

Cost-sensitive boosting algorithms: Do we really need them?

Author: Gavin Brown
GE Hinton
GW Brier
H Masnadi-Shirazi
I Landesa-Vázquez
I Landesa-Vázquez
J Demšar
J Friedman
L Mason
Meelis Kull
Narayanan Edakunni
Nikolaos Nikolaou
PA Flach
PB Nemenyi
Peter Flach
RE Schapire
S Rosset
T Robertson
X Wu
Y Freund
Y Sun
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Cost-sensitive boosting: A unified approach

Author: Nikolaou Nikolaos
Publication venue
Publication date: 01/08/2017
Field of study

The University of Manchester - Institutional Repository

Development of new cost-sensitive Bayesian network learning algorithms

Author: Nashnush EB
Publication venue
Publication date
Field of study

Bayesian networks are becoming an increasingly important area for research and have been proposed for real world applications such as medical diagnoses, image recognition, and fraud detection. In all of these applications, accuracy is not sufficient alone, as there are costs involved when errors occur. Hence, this thesis develops new algorithms, referred to as cost-sensitive Bayesian network algorithms that aim to minimise the expected costs due to misclassifications. The study presents a review of existing research on cost-sensitive learning and identifies three common methods for developing cost-sensitive algorithms for decision tree learning. These methods are then utilised to develop three different algorithms for learning cost-sensitive Bayesian networks: (i) an indirect method, where costs are included by changing the data distribution without changing a cost-insensitive algorithm; (ii) a direct method in which an existing cost-insensitive algorithm is altered to take account of cost; and (iii) by using Genetic algorithms to evolve cost-sensitive Bayesian networks.This research explores new algorithms, which are evaluated on 36 benchmark datasets and compared to existing cost-sensitive algorithms such as MetaCost+J48, and MetaCost+BN as well as an existing cost-insensitive Bayesian network algorithm. The obtained results exhibit improvements in comparison to other algorithms in terms of cost, whilst still maintaining accuracy. In our experiment methodology, all experiments are repeated with 10 random trials, and in each trial, the data divided into 75% for training and 25% for testing. The results show that: (i) all three new algorithms perform better than the cost-insensitive Bayesian learning algorithm on all 36 datasets in terms of cost; (ii) the new algorithms, which are based on indirect methods, direct methods, and Genetic algorithms, work better than MetaCost+J48 on 29, 28, and 31 out of the 36 datasets respectively in terms of cost; (iii) the algorithm that utilise an indirect method performs well on imbalanced data compared to our two algorithms on 8 out of the 36 datasets in terms of cost; (iv) the algorithm that is based on a direct method outperform the new algorithms on 13 out of 36 datasets in terms of cost; (v) the evolutionary version of the algorithm is better than the other algorithms, including the use of the direct and indirect methods, on 24 out of the 36 datasets in terms of both costs and accuracy; (vi) all three new algorithms perform better than the MetaCost+BN on all 36 datasets in terms of cost

University of Salford Institutional Repository