Search CORE

36,300 research outputs found

A survey of cost-sensitive decision tree induction algorithms

Author: Bradford J. P.
Elkan C.
Esmeir S.
Esmeir S.
Estruch V.
Fan W.
Ferri C.
Freund Y.
Hart A. E.
Knoll U.
Li J.
Lin F. Y.
Liu X.
Mease D.
Murthy S.
Ni A.
Norton S. W.
Pazzani M.
Quinlan J. R.
Quinlan J. R.
Schapire R. E.
Sunil Vadera
Susan Lomax
Swets J.
Tan M.
Ting K.
Ting K.
Ting K. M.
von Neumann J.
Zadrozny B.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/02/2013
Field of study

The past decade has seen a significant interest on the problem of inducing decision trees that take account of costs of misclassification and costs of acquiring the features used for decision making. This survey identifies over 50 algorithms including approaches that are direct adaptations of accuracy based methods, use genetic algorithms, use anytime methods and utilize boosting and bagging. The survey brings together these different studies and novel approaches to cost-sensitive decision tree learning, provides a useful taxonomy, a historical timeline of how the field has developed and should provide a useful reference point for future research in this field

University of Salford Institutional Repository

Crossref

A System for Induction of Oblique Decision Trees

Author: Kasif S.
Murthy S. K.
Salzberg S.
Publication venue
Publication date: 01/01/1994
Field of study

This article describes a new system for induction of oblique decision trees. This system, OC1, combines deterministic hill-climbing with two forms of randomization to find a good oblique split (in the form of a hyperplane) at each node of a decision tree. Oblique decision tree methods are tuned especially for domains in which the attributes are numeric, although they can be adapted to symbolic or mixed symbolic/numeric attributes. We present extensive empirical studies, using both real and artificial data, that analyze OC1's ability to construct oblique trees that are smaller and more accurate than their axis-parallel counterparts. We also examine the benefits of randomization for the construction of oblique decision trees.Comment: See http://www.jair.org/ for an online appendix and other files accompanying this articl

arXiv.org e-Print Archive

CiteSeerX

Rule-based Machine Learning Methods for Functional Prediction

Author: Indurkhya N.
Weiss S. M.
Publication venue
Publication date: 01/01/1995
Field of study

We describe a machine learning method for predicting the value of a real-valued function, given the values of multiple input variables. The method induces solutions from samples in the form of ordered disjunctive normal form (DNF) decision rules. A central objective of the method and representation is the induction of compact, easily interpretable solutions. This rule-based decision model can be extended to search efficiently for similar cases prior to approximating function values. Experimental results on real-world data demonstrate that the new techniques are competitive with existing machine learning and statistical methods and can sometimes yield superior regression performance.Comment: See http://www.jair.org/ for any accompanying file

arXiv.org e-Print Archive

CiteSeerX

Inducing safer oblique trees without costs

Author: Althoff K.
Bennett K.P.
Bennett K.P.
Berry M.
Blake C.
Bradford J.
Breiman L.
Cohen R.
Domingos P.
Elkan C.
Elomaa T.
Fan W.
Grefenstette J.
Knoll U.
Kolodner J.
Morrison D.
Norusis M.
Nunez M.
Pazzani M.
Provost F.J.
Provost F.J.
Quinlan J.R.
Quinlan J.R.
Sunil Vadera
Tan M.
Ting K.
Turney P.
Vadera S.
Publication venue: 'Wiley'
Publication date: 01/09/2005
Field of study

Decision tree induction has been widely studied and applied. In safety applications, such as determining whether a chemical process is safe or whether a person has a medical condition, the cost of misclassification in one of the classes is significantly higher than in the other class. Several authors have tackled this problem by developing cost-sensitive decision tree learning algorithms or have suggested ways of changing the distribution of training examples to bias the decision tree learning process so as to take account of costs. A prerequisite for applying such algorithms is the availability of costs of misclassification. Although this may be possible for some applications, obtaining reasonable estimates of costs of misclassification is not easy in the area of safety. This paper presents a new algorithm for applications where the cost of misclassifications cannot be quantified, although the cost of misclassification in one class is known to be significantly higher than in another class. The algorithm utilizes linear discriminant analysis to identify oblique relationships between continuous attributes and then carries out an appropriate modification to ensure that the resulting tree errs on the side of safety. The algorithm is evaluated with respect to one of the best known cost-sensitive algorithms (ICET), a well-known oblique decision tree algorithm (OC1) and an algorithm that utilizes robust linear programming

University of Salford Institutional Repository

Crossref

CSNL: A cost-sensitive non-linear decision tree algorithm

Author: Allwein E. L.
Bennett K. P.
Bradford J.
Breslow L.
Brown G.
Elkan C.
Fan W.
Kanani P.
Knoll U.
Martin A.
Masnadi-Shirazi H.
Pazzani M.
Provost F. J.
Sunil Vadera
Ting K.
Ting K.
Turney P.
Vadera S.
Vadera S.
Zadrozny B.
Zhu X.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2010
Field of study

This article presents a new decision tree learning algorithm called CSNL that induces Cost-Sensitive Non-Linear decision trees. The algorithm is based on the hypothesis that nonlinear decision nodes provide a better basis than axis-parallel decision nodes and utilizes discriminant analysis to construct nonlinear decision trees that take account of costs of misclassification. The performance of the algorithm is evaluated by applying it to seventeen datasets and the results are compared with those obtained by two well known cost-sensitive algorithms, ICET and MetaCost, which generate multiple trees to obtain some of the best results to date. The results show that CSNL performs at least as well, if not better than these algorithms, in more than twelve of the datasets and is considerably faster. The use of bagging with CSNL further enhances its performance showing the significant benefits of using nonlinear decision nodes. The performance of the algorithm is evaluated by applying it to seventeen data sets and the results are compared with those obtained by two well known cost-sensitive algorithms, ICET and MetaCost, which generate multiple trees to obtain some of the best results to date. The results show that CSNL performs at least as well, if not better than these algorithms, in more than twelve of the data sets and is considerably faster. The use of bagging with CSNL further enhances its performance showing the significant benefits of using non-linear decision nodes

CiteSeerX

University of Salford Institutional Repository

Crossref

Local Rule-Based Explanations of Black Box Decision Systems

Author: Giannotti Fosca
Guidotti Riccardo
Monreale Anna
Pedreschi Dino
Ruggieri Salvatore
Turini Franco
Publication venue
Publication date: 01/01/2018
Field of study

The recent years have witnessed the rise of accurate but obscure decision systems which hide the logic of their internal decision processes to the users. The lack of explanations for the decisions of black box systems is a key ethical issue, and a limitation to the adoption of machine learning components in socially sensitive and safety-critical contexts. %Therefore, we need explanations that reveals the reasons why a predictor takes a certain decision. In this paper we focus on the problem of black box outcome explanation, i.e., explaining the reasons of the decision taken on a specific instance. We propose LORE, an agnostic method able to provide interpretable and faithful explanations. LORE first leans a local interpretable predictor on a synthetic neighborhood generated by a genetic algorithm. Then it derives from the logic of the local interpretable predictor a meaningful explanation consisting of: a decision rule, which explains the reasons of the decision; and a set of counterfactual rules, suggesting the changes in the instance's features that lead to a different outcome. Wide experiments show that LORE outperforms existing methods and baselines both in the quality of explanations and in the accuracy in mimicking the black box

arXiv.org e-Print Archive

Archivio istituzionale della Ricerca - Scuola Normale Superiore

Archivio della Ricerca - Università di Pisa

On The Stability of Interpretable Models

Author: Guidotti Riccardo
Ruggieri Salvatore
Publication venue
Publication date: 01/01/2019
Field of study

Interpretable classification models are built with the purpose of providing a comprehensible description of the decision logic to an external oversight agent. When considered in isolation, a decision tree, a set of classification rules, or a linear model, are widely recognized as human-interpretable. However, such models are generated as part of a larger analytical process. Bias in data collection and preparation, or in model's construction may severely affect the accountability of the design process. We conduct an experimental study of the stability of interpretable models with respect to feature selection, instance selection, and model selection. Our conclusions should raise awareness and attention of the scientific community on the need of a stability impact assessment of interpretable models

arXiv.org e-Print Archive

Archivio della Ricerca - Università di Pisa

Fitting Prediction Rule Ensembles with R Package pre

Author: Fokkema Marjolein
Publication venue: 'Foundation for Open Access Statistic'
Publication date: 01/02/2020
Field of study

Prediction rule ensembles (PREs) are sparse collections of rules, offering highly interpretable regression and classification models. This paper presents the R package pre, which derives PREs through the methodology of Friedman and Popescu (2008). The implementation and functionality of package pre is described and illustrated through application on a dataset on the prediction of depression. Furthermore, accuracy and sparsity of PREs is compared with that of single trees, random forest and lasso regression in four benchmark datasets. Results indicate that pre derives ensembles with predictive accuracy comparable to that of random forests, while using a smaller number of variables for prediction

arXiv.org e-Print Archive

Journal of Statistical Software

Leiden University Scholary Publications

Can k-NN imputation improve the performance of C4.5 with small software project data sets? A comparative evaluation

Author: Albrecht
Austin
Baird
Batista
Boehm
Boehm
Breiman
Briand
Briand
Briand
Brockmeier
Cartwright
Cheung
Clark
Feelders
Finnie
Gama
Gray
Holte
Jain
Jeffery
Jun Liu
Jönsson
Kemerer
Khotanzad
Kibler
Kim
Kitchenham
Kohavi
Little
Little
Little
Little
Little
Martin Shepperd
Miranda
Myrtveit
Pickard
Putnam
Qinbao Song
Quinlan
Robins
Rubin
Rubin
Rubin
Rubin
Samson
Selby
Shao
Shepperd
Shepperd
Siedelecki
Song
Song
Srinivasan
Strike
Tabachnick
Tay
Walkerden
Walston
Xiangru Chen
Publication venue: 'Elsevier BV'
Publication date: 01/12/2008
Field of study

Missing data is a widespread problem that can affect the ability to use data to construct effective prediction systems. We investigate a common machine learning technique that can tolerate missing values, namely C4.5, to predict cost using six real world software project databases. We analyze the predictive performance after using the k-NN missing data imputation technique to see if it is better to tolerate missing data or to try to impute missing values and then apply the C4.5 algorithm. For the investigation, we simulated three missingness mechanisms, three missing data patterns, and five missing data percentages. We found that the k-NN imputation can improve the prediction accuracy of C4.5. At the same time, both C4.5 and k-NN are little affected by the missingness mechanism, but that the missing data pattern and the missing data percentage have a strong negative impact upon prediction (or imputation) accuracy particularly if the missing data percentage exceeds 40%

Crossref

Brunel University Research Archive