Search CORE

416 research outputs found

Ensemble of Example-Dependent Cost-Sensitive Decision Trees

Author: Aouada Djamila
Bahnsen Alejandro Correa
Ottersten Bjorn
Publication venue
Publication date: 01/01/2015
Field of study

Several real-world classification problems are example-dependent cost-sensitive in nature, where the costs due to misclassification vary between examples and not only within classes. However, standard classification methods do not take these costs into account, and assume a constant cost of misclassification errors. In previous works, some methods that take into account the financial costs into the training of different algorithms have been proposed, with the example-dependent cost-sensitive decision tree algorithm being the one that gives the highest savings. In this paper we propose a new framework of ensembles of example-dependent cost-sensitive decision-trees. The framework consists in creating different example-dependent cost-sensitive decision trees on random subsamples of the training set, and then combining them using three different combination approaches. Moreover, we propose two new cost-sensitive combination approaches; cost-sensitive weighted voting and cost-sensitive stacking, the latter being based on the cost-sensitive logistic regression method. Finally, using five different databases, from four real-world applications: credit card fraud detection, churn modeling, credit scoring and direct marketing, we evaluate the proposed method against state-of-the-art example-dependent cost-sensitive techniques, namely, cost-proportionate sampling, Bayes minimum risk and cost-sensitive decision trees. The results show that the proposed algorithms have better results for all databases, in the sense of higher savings.Comment: 13 pages, 6 figures, Submitted for possible publicatio

arXiv.org e-Print Archive

Open Repository and Bibliography - Luxembourg

COMET: A Recipe for Learning and Using Large Ensembles on Massive Data

Author: Basilico Justin D.
Dixon Kevin R.
Kegelmeyer W. Philip
Kolda Tamara G.
Munson M. Arthur
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2011
Field of study

COMET is a single-pass MapReduce algorithm for learning on large-scale data. It builds multiple random forest ensembles on distributed blocks of data and merges them into a mega-ensemble. This approach is appropriate when learning from massive-scale data that is too large to fit on a single machine. To get the best accuracy, IVoting should be used instead of bagging to generate the training subset for each decision tree in the random forest. Experiments with two large datasets (5GB and 50GB compressed) show that COMET compares favorably (in both accuracy and training time) to learning on a subsample of data using a serial algorithm. Finally, we propose a new Gaussian approach for lazy ensemble evaluation which dynamically decides how many ensemble members to evaluate per data point; this can reduce evaluation cost by 100X or more

arXiv.org e-Print Archive

CiteSeerX

Gossip Learning with Linear Models on Fully Distributed Data

Author: Hegedüs István
Jelasity Márk
Ormándi Róbert
Publication venue: 'Wiley'
Publication date: 06/06/2012
Field of study

Machine learning over fully distributed data poses an important problem in peer-to-peer (P2P) applications. In this model we have one data record at each network node, but without the possibility to move raw data due to privacy considerations. For example, user profiles, ratings, history, or sensor readings can represent this case. This problem is difficult, because there is no possibility to learn local models, the system model offers almost no guarantees for reliability, yet the communication cost needs to be kept low. Here we propose gossip learning, a generic approach that is based on multiple models taking random walks over the network in parallel, while applying an online learning algorithm to improve themselves, and getting combined via ensemble learning methods. We present an instantiation of this approach for the case of classification with linear models. Our main contribution is an ensemble learning method which---through the continuous combination of the models in the network---implements a virtual weighted voting mechanism over an exponential number of models at practically no extra cost as compared to independent random walks. We prove the convergence of the method theoretically, and perform extensive experiments on benchmark datasets. Our experimental analysis demonstrates the performance and robustness of the proposed approach.Comment: The paper was published in the journal Concurrency and Computation: Practice and Experience http://onlinelibrary.wiley.com/journal/10.1002/%28ISSN%291532-0634 (DOI: http://dx.doi.org/10.1002/cpe.2858). The modifications are based on the suggestions from the reviewer

arXiv.org e-Print Archive

SZTE Publicatio Repozitórium - SZTE - Repository of Publications

Classification in P2P Networks by Bagging Cascade RSVMs

Author: ANG Hock Hee
DATTA Anwitaman
GOPALKRISHNAN Vikvekanand
HOI Steven C. H.
NG Wee Keong
Publication venue: 'VLDB Endowment'
Publication date: 01/08/2008
Field of study

Institutional Knowledge at Singapore Management University

Collaborative Learning by Boosting in Distributed Environments

Author: Changshui Zhang
Shijun Wang
Publication venue
Publication date: 23/04/2020
Field of study

Abstrac

CiteSeerX

Classification in P2P Networks with Cascade Support Vendor Machines

Author: ANG Hock Hee
Gopalkrishnan Vivekanand
HOI Steven C. H.
NG Wee-Keong
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/11/2013
Field of study

Institutional Knowledge at Singapore Management University

Ensembling Neural Networks for Regression

Author: João Miguel Mendes Ribeiro Agulha
Publication venue
Publication date: 21/07/2021
Field of study

Repositório Aberto da Universidade do Porto

Measuring confidence of missing data estimation for HIV classification

Author: Mistry Jaisheel
Publication venue
Publication date: 27/07/2009
Field of study

Computational intelligence methods have been applied to classify pregnant women’s HIV status using demographic data from the South African Antenatal Seroprevalence database obtained from the South African Department of Health. Classification accuracies using a multitude of computational intelligence techniques ranged between 60% and 70%. The purpose of this research is to determine the certainty of predicting the HIV status of a patient. Ensemble neural networks were used for the investigation to obtain a set of possible solutions. The predictive certainty of each patients predicted HIV status was computed by giving the percentage of most dominant outputs from the set of possible solutions. Ensembles of neural networks were obtained using boosting, bagging and the Bayesian approach. It was found that the ensemble trained using the Bayesian approach is most suitable for the proposed predictive certainty measure. Furthermore, a sensitivity analysis was done to investigate how each of the demographic variables influenced the certainty of predicting the HIV status of a patien

Wits Institutional Repository on DSPACE

Ensemble diversity measures and their application to thinning

Author: R BANFIELD
Publication venue: 'Elsevier BV'
Publication date: 01/01/2004
Field of study

Crossref