Search CORE

15 research outputs found

Inducing safer oblique trees without costs

Author: Althoff K.
Bennett K.P.
Bennett K.P.
Berry M.
Blake C.
Bradford J.
Breiman L.
Cohen R.
Domingos P.
Elkan C.
Elomaa T.
Fan W.
Grefenstette J.
Knoll U.
Kolodner J.
Morrison D.
Norusis M.
Nunez M.
Pazzani M.
Provost F.J.
Provost F.J.
Quinlan J.R.
Quinlan J.R.
Sunil Vadera
Tan M.
Ting K.
Turney P.
Vadera S.
Publication venue: 'Wiley'
Publication date: 01/09/2005
Field of study

Decision tree induction has been widely studied and applied. In safety applications, such as determining whether a chemical process is safe or whether a person has a medical condition, the cost of misclassification in one of the classes is significantly higher than in the other class. Several authors have tackled this problem by developing cost-sensitive decision tree learning algorithms or have suggested ways of changing the distribution of training examples to bias the decision tree learning process so as to take account of costs. A prerequisite for applying such algorithms is the availability of costs of misclassification. Although this may be possible for some applications, obtaining reasonable estimates of costs of misclassification is not easy in the area of safety. This paper presents a new algorithm for applications where the cost of misclassifications cannot be quantified, although the cost of misclassification in one class is known to be significantly higher than in another class. The algorithm utilizes linear discriminant analysis to identify oblique relationships between continuous attributes and then carries out an appropriate modification to ensure that the resulting tree errs on the side of safety. The algorithm is evaluated with respect to one of the best known cost-sensitive algorithms (ICET), a well-known oblique decision tree algorithm (OC1) and an algorithm that utilizes robust linear programming

University of Salford Institutional Repository

Crossref

CSNL: A cost-sensitive non-linear decision tree algorithm

Author: Allwein E. L.
Bennett K. P.
Bradford J.
Breslow L.
Brown G.
Elkan C.
Fan W.
Kanani P.
Knoll U.
Martin A.
Masnadi-Shirazi H.
Pazzani M.
Provost F. J.
Sunil Vadera
Ting K.
Ting K.
Turney P.
Vadera S.
Vadera S.
Zadrozny B.
Zhu X.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2010
Field of study

This article presents a new decision tree learning algorithm called CSNL that induces Cost-Sensitive Non-Linear decision trees. The algorithm is based on the hypothesis that nonlinear decision nodes provide a better basis than axis-parallel decision nodes and utilizes discriminant analysis to construct nonlinear decision trees that take account of costs of misclassification. The performance of the algorithm is evaluated by applying it to seventeen datasets and the results are compared with those obtained by two well known cost-sensitive algorithms, ICET and MetaCost, which generate multiple trees to obtain some of the best results to date. The results show that CSNL performs at least as well, if not better than these algorithms, in more than twelve of the datasets and is considerably faster. The use of bagging with CSNL further enhances its performance showing the significant benefits of using nonlinear decision nodes. The performance of the algorithm is evaluated by applying it to seventeen data sets and the results are compared with those obtained by two well known cost-sensitive algorithms, ICET and MetaCost, which generate multiple trees to obtain some of the best results to date. The results show that CSNL performs at least as well, if not better than these algorithms, in more than twelve of the data sets and is considerably faster. The use of bagging with CSNL further enhances its performance showing the significant benefits of using non-linear decision nodes

CiteSeerX

University of Salford Institutional Repository

Crossref

A cost-sensitive learning algorithm for fuzzy rule-based classifiers

Author: Beck Sebastian
Jäkel Jens
Mikut Ralf
Publication venue: Universitat Politècnica de Catalunya. Secció de Matemàtiques i Informàtica
Publication date: 01/01/2004
Field of study

Designing classifiers may follow different goals. Which goal to prefer among others depends on the given cost situation and the class distribution. For example, a classifier designed for best accuracy in terms of misclassifica- tions may fail when the cost of misclassification of one class is much higher than that of the other. This paper presents a decision-theoretic extension to make fuzzy rule generation cost-sensitive. Furthermore, it will be shown how interpretability aspects and the costs of feature acquisition can be ac- counted for during classifier design. Natural language text is used to explain the generated fuzzy rules and their design proces

UPCommons. Portal del coneixement obert de la UPC

A survey of cost-sensitive decision tree induction algorithms

Author: Bradford J. P.
Elkan C.
Esmeir S.
Esmeir S.
Estruch V.
Fan W.
Ferri C.
Freund Y.
Hart A. E.
Knoll U.
Li J.
Lin F. Y.
Liu X.
Mease D.
Murthy S.
Ni A.
Norton S. W.
Pazzani M.
Quinlan J. R.
Quinlan J. R.
Schapire R. E.
Sunil Vadera
Susan Lomax
Swets J.
Tan M.
Ting K.
Ting K.
Ting K. M.
von Neumann J.
Zadrozny B.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/02/2013
Field of study

The past decade has seen a significant interest on the problem of inducing decision trees that take account of costs of misclassification and costs of acquiring the features used for decision making. This survey identifies over 50 algorithms including approaches that are direct adaptations of accuracy based methods, use genetic algorithms, use anytime methods and utilize boosting and bagging. The survey brings together these different studies and novel approaches to cost-sensitive decision tree learning, provides a useful taxonomy, a historical timeline of how the field has developed and should provide a useful reference point for future research in this field

University of Salford Institutional Repository

Crossref

Cost-Sensitive Classification: Empirical Evaluation of a Hybrid Genetic Decision Tree Induction Algorithm

Author: Turney P. D.
Publication venue
Publication date: 01/01/1995
Field of study

This paper introduces ICET, a new algorithm for cost-sensitive classification. ICET uses a genetic algorithm to evolve a population of biases for a decision tree induction algorithm. The fitness function of the genetic algorithm is the average cost of classification when using the decision tree, including both the costs of tests (features, measurements) and the costs of classification errors. ICET is compared here with three other algorithms for cost-sensitive classification - EG2, CS-ID3, and IDX - and also with C4.5, which classifies without regard to cost. The five algorithms are evaluated empirically on five real-world medical datasets. Three sets of experiments are performed. The first set examines the baseline performance of the five algorithms on the five datasets and establishes that ICET performs significantly better than its competitors. The second set tests the robustness of ICET under a variety of conditions and shows that ICET maintains its advantage. The third set looks at ICET's search in bias space and discovers a way to improve the search.Comment: See http://www.jair.org/ for any accompanying file

arXiv.org e-Print Archive

CiteSeerX

NRC Publications Archive

CogPrints Cognitive Sciences Eprint Archive

ROC curves and the chi2 test

Author: Andrew P. Bradley
Devijver
Everitt
Fukunaga
Ingelfinger
Knoll
Press
Selin
Sherwood
Sterling
Twomey
Weiss
Publication venue: 'Elsevier BV'
Publication date: 01/01/1996
Field of study

In this paper we review the Receiver Operating Characteristic (ROC) curve, and the chi(2) test statistic, in relation to the analysis of a confusion matrix. We then show how these two methods are related, and propose an extension to the ROC curve so that it shows contours of chi(2) values. These contours can be used to provide further insight into the appropriate setting of the decision threshold for a particular application

Crossref

Queensland University of Technology ePrints Archive

University of Queensland eSpace

Cost-sensitive ensemble learning: a unifying framework

Author: Petrides George
Verbeke Wouter
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 28/09/2021
Field of study

Over the years, a plethora of cost-sensitive methods have been proposed for learning on data when different types of misclassification errors incur different costs. Our contribution is a unifying framework that provides a comprehensive and insightful overview on cost-sensitive ensemble methods, pinpointing their differences and similarities via a fine-grained categorization. Our framework contains natural extensions and generalisations of ideas across methods, be it AdaBoost, Bagging or Random Forest, and as a result not only yields all methods known to date but also some not previously considered.publishedVersio

Lirias

University of Bergen

NORA - Norwegian Open Research Archives

Recommended from our members

Bootstrap methods for the cost-sensitive evaluation of classifiers

Author: Dietterich Thomas Glen
Margineantu Dragos D. (Dragos Dorin)
Oregon State University. Dept. of Computer Science
Publication venue: Corvallis, OR : Oregon State University, Dept. of Computer Science
Publication date
Field of study

Many machine learning applications require classifiers that minimize an asymmetric cost function rather than the misclassification rate, and several recent papers have addressed this problem. However, these papers have either applied no statistical testing or have applied statistical methods that are not appropriate for the cost-sensitive setting. Without good statistical methods, it is difficult to tell whether these new cost-sensitive methods are better than existing methods that ignore costs, and it is also difficult to tell whether one cost-sensitive method is better than another. To rectify this problem, this paper presents two statistical methods for the cost-sensitive setting. The first constructs a confidence interval for the expected cost of a single classifier. The second constructs a confidence interval for the expected difference in costs of two classifiers. In both cases, the basic idea is to separate the problem of estimating the probabilities of each cell in the confusion matrix (which is independent of the cost matrix) from the problem of computing the expected cost. We show experimentally that these bootstrap tests work better than applying standard z tests based on the normal distribution

ScholarsArchive@OSU

Development of a Machine Learning-Based Financial Risk Control System

Author: Hu Zhigang
Publication venue: DigitalCommons@USU
Publication date: 01/05/2022
Field of study

With the gradual end of the COVID-19 outbreak and the gradual recovery of the economy, more and more individuals and businesses are in need of loans. This demand brings business opportunities to various financial institutions, but also brings new risks. The traditional loan application review is mostly manual and relies on the business experience of the auditor, which has the disadvantages of not being able to process large quantities and being inefficient. Since the traditional audit processing method is no longer suitable some other method of reducing the rate of non-performing loans and detecting fraud in applications is urgently needed by financial institutions. In this project, a financial risk control model is built by using various machine learning algorithms. The model is used to replace the traditional manual approach to review loan applications. It improves the speed of review as well as the accuracy and approval rate of the review. Machine learning algorithms were also used in this project to create a loan user scorecard system that better reflects changes in user information compared to the credit card systems used by financial institutions today. In this project, the data imbalance problem and the performance improvement problem are also explored

DigitalCommons@USU