Search CORE

6,137 research outputs found

Ensemble of Example-Dependent Cost-Sensitive Decision Trees

Author: Aouada Djamila
Bahnsen Alejandro Correa
Ottersten Bjorn
Publication venue
Publication date: 01/01/2015
Field of study

Several real-world classification problems are example-dependent cost-sensitive in nature, where the costs due to misclassification vary between examples and not only within classes. However, standard classification methods do not take these costs into account, and assume a constant cost of misclassification errors. In previous works, some methods that take into account the financial costs into the training of different algorithms have been proposed, with the example-dependent cost-sensitive decision tree algorithm being the one that gives the highest savings. In this paper we propose a new framework of ensembles of example-dependent cost-sensitive decision-trees. The framework consists in creating different example-dependent cost-sensitive decision trees on random subsamples of the training set, and then combining them using three different combination approaches. Moreover, we propose two new cost-sensitive combination approaches; cost-sensitive weighted voting and cost-sensitive stacking, the latter being based on the cost-sensitive logistic regression method. Finally, using five different databases, from four real-world applications: credit card fraud detection, churn modeling, credit scoring and direct marketing, we evaluate the proposed method against state-of-the-art example-dependent cost-sensitive techniques, namely, cost-proportionate sampling, Bayes minimum risk and cost-sensitive decision trees. The results show that the proposed algorithms have better results for all databases, in the sense of higher savings.Comment: 13 pages, 6 figures, Submitted for possible publicatio

arXiv.org e-Print Archive

Open Repository and Bibliography - Luxembourg

Comparing Multi-Label Classification Methods for Provisional Biopharmaceutics Class Prediction.

Author: Freitas Alex A.
Ghafourian Taravat
Newby D.
Publication venue
Publication date: 05/01/2015
Field of study

Kent Academic Repository

Analyzing E-Learning Adoption via Recursive Partitioning

Author: Christian Schade
Philipp Köllinger
Publication venue
Publication date
Field of study

The paper analyzes factors that influence the adoption of e-learning and gives an example of how to forecast technology adoption based on a post-hoc predictive segmentation using a classification and regression tree (CART). We find strong evidence for the existence of technological interdependencies and organizational learning effects. Furthermore, we find different paths to elearning adoption. The results of the analysis suggest a growing "digital divide" among firms. We use cross-sectional data from a European survey about e-business in June 2002, covering almost 6,000 enterprises in 15 industry sectors and 4 countries. Comparing the predictive quality of CART, we find that CART outperforms a traditional logistic regression. The results are more parsimonious, i. e. CARTs use less explanatory variables, better interpretable since different paths of adoption are detected, and from a statistical standpoint, because interactions between the covariates are taken into account.Technology Adoption, Path Dependence, Interaction between Different Technologies, Regression Trees, Predictive Segmentation, Logistic Regression, E-Learning, E-Business

Research Papers in Economics

A generic optimising feature extraction method using multiobjective genetic programming

Author: Rockett P.I
Zhang Y.
Publication venue: 'Elsevier BV'
Publication date: 01/01/2011
Field of study

In this paper, we present a generic, optimising feature extraction method using multiobjective genetic programming. We re-examine the feature extraction problem and show that effective feature extraction can significantly enhance the performance of pattern recognition systems with simple classifiers. A framework is presented to evolve optimised feature extractors that transform an input pattern space into a decision space in which maximal class separability is obtained. We have applied this method to real world datasets from the UCI Machine Learning and StatLog databases to verify our approach and compare our proposed method with other reported results. We conclude that our algorithm is able to produce classifiers of superior (or equivalent) performance to the conventional classifiers examined, suggesting removal of the need to exhaustively evaluate a large family of conventional classifiers on any new problem. (C) 2010 Elsevier B.V. All rights reserved

White Rose Research Online

Credit-Scoring Methods (in English)

Author: Evžen Koèenda
Martin Vojtek
Publication venue
Publication date
Field of study

The paper reviews the best-developed and most frequently applied methods of credit scoring employed by commercial banks when evaluating loan applications. The authors concentrate on retail loans – applied research in this segment is limited, though there has been a sharp increase in the volume of loans to retail clients in recent years. Logit analysis is identified as the most frequent credit-scoring method used by banks. However, other nonparametric methods are widespread in terms of pattern recognition. The methods reviewed have potential for application in post-transition countries.banking sector, credit scoring, discrimination analysis, pattern recognition, retail loans

Research Papers in Economics

Cost-sensitive ensemble learning: a unifying framework

Author: Petrides George
Verbeke Wouter
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 28/09/2021
Field of study

Over the years, a plethora of cost-sensitive methods have been proposed for learning on data when different types of misclassification errors incur different costs. Our contribution is a unifying framework that provides a comprehensive and insightful overview on cost-sensitive ensemble methods, pinpointing their differences and similarities via a fine-grained categorization. Our framework contains natural extensions and generalisations of ideas across methods, be it AdaBoost, Bagging or Random Forest, and as a result not only yields all methods known to date but also some not previously considered.publishedVersio

Lirias

University of Bergen

NORA - Norwegian Open Research Archives

Cost-Sensitive Decision Trees with Completion Time Requirements

Author: Hung-Pin KAO
Jen TANG
Kwei TANG
Publication venue
Publication date
Field of study

In many classification tasks, managing costs and completion times are the main concerns. In this paper, we assume that the completion time for classifying an instance is determined by its class label, and that a late penalty cost is incurred if the deadline is not met. This time requirement enriches the classification problem but posts a challenge to developing a solution algorithm. We propose an innovative approach for the decision tree induction, which produces multiple candidate trees by allowing more than one splitting attribute at each node. The user can specify the maximum number of candidate trees to control the computational efforts required to produce the final solution. In the tree-induction process, an allocation scheme is used to dynamically distribute the given number of candidate trees to splitting attributes according to their estimated contributions to cost reduction. The algorithm finds the final tree by backtracking. An extensive experiment shows that the algorithm outperforms the top-down heuristic and can effectively obtain the optimal or near-optimal decision trees without an excessive computation time.classification, decision tree, cost and time sensitive learning, late penalty

Research Papers in Economics

Automated Classification of Airborne Laser Scanning Point Clouds

Author: A. Gressin
A. Kobler
A. Roncat
B. Höfle
C. Briese
C. Mallet
D. Goldberg
G. Sithole
H. Gross
J. Otepka
L. Breiman
M. Doneus
M. Friedl
M. Hollaus
M. Rutzinger
P. Dorninger
P. Mather
R. Prinz
S. Safavian
W. Wagner
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

Making sense of the physical world has always been at the core of mapping. Up until recently, this has always dependent on using the human eye. Using airborne lasers, it has become possible to quickly "see" more of the world in many more dimensions. The resulting enormous point clouds serve as data sources for applications far beyond the original mapping purposes ranging from flooding protection and forestry to threat mitigation. In order to process these large quantities of data, novel methods are required. In this contribution, we develop models to automatically classify ground cover and soil types. Using the logic of machine learning, we critically review the advantages of supervised and unsupervised methods. Focusing on decision trees, we improve accuracy by including beam vector components and using a genetic algorithm. We find that our approach delivers consistently high quality classifications, surpassing classical methods

arXiv.org e-Print Archive

CiteSeerX

Crossref