140,098 research outputs found
Ensemble of Example-Dependent Cost-Sensitive Decision Trees
Several real-world classification problems are example-dependent
cost-sensitive in nature, where the costs due to misclassification vary between
examples and not only within classes. However, standard classification methods
do not take these costs into account, and assume a constant cost of
misclassification errors. In previous works, some methods that take into
account the financial costs into the training of different algorithms have been
proposed, with the example-dependent cost-sensitive decision tree algorithm
being the one that gives the highest savings. In this paper we propose a new
framework of ensembles of example-dependent cost-sensitive decision-trees. The
framework consists in creating different example-dependent cost-sensitive
decision trees on random subsamples of the training set, and then combining
them using three different combination approaches. Moreover, we propose two new
cost-sensitive combination approaches; cost-sensitive weighted voting and
cost-sensitive stacking, the latter being based on the cost-sensitive logistic
regression method. Finally, using five different databases, from four
real-world applications: credit card fraud detection, churn modeling, credit
scoring and direct marketing, we evaluate the proposed method against
state-of-the-art example-dependent cost-sensitive techniques, namely,
cost-proportionate sampling, Bayes minimum risk and cost-sensitive decision
trees. The results show that the proposed algorithms have better results for
all databases, in the sense of higher savings.Comment: 13 pages, 6 figures, Submitted for possible publicatio
CSNL: A cost-sensitive non-linear decision tree algorithm
This article presents a new decision tree learning algorithm called CSNL that induces Cost-Sensitive Non-Linear decision trees. The algorithm is based on the hypothesis that nonlinear decision nodes provide a better basis than axis-parallel decision nodes and utilizes discriminant analysis to construct nonlinear decision trees that take account of costs of misclassification.
The performance of the algorithm is evaluated by applying it to seventeen datasets and the results are compared with those obtained by two well known cost-sensitive algorithms, ICET and MetaCost, which generate multiple trees to obtain some of the best results to date. The results show that CSNL performs at least as well, if not better than these algorithms, in more than twelve of the datasets and is considerably faster. The use of bagging with CSNL further enhances its performance showing the significant benefits of using nonlinear decision nodes.
The performance of the algorithm is evaluated by applying it to seventeen data sets and the results are
compared with those obtained by two well known cost-sensitive algorithms, ICET and MetaCost, which generate multiple trees to obtain some of the best results to date.
The results show that CSNL performs at least as well, if not better than these algorithms, in more than twelve of the data sets and is considerably faster.
The use of bagging with CSNL further enhances its performance showing the significant benefits of using non-linear decision nodes
Model-based analysis of the potential of macroinvertebrates as indicators for microbial pathogens in rivers
The quality of water prior to its use for drinking, farming or recreational purposes must comply with several physicochemical and microbiological standards to safeguard society and the environment. In order to satisfy these standards, expensive analyses and highly trained personnel in laboratories are required. Whereas macroinvertebrates have been used as ecological indicators to review the health of aquatic ecosystems. In this research, the relationship between microbial pathogens and macrobenthic invertebrate taxa was examined in the Machangara River located in the southern Andes of Ecuador, in which 33 sites, according to their land use, were chosen to collect physicochemical, microbiological and biological parameters. Decision tree models (DTMs) were used to generate rules that link the presence and abundance of some benthic families to microbial pathogen standards. The aforementioned DTMs provide an indirect, approximate, and quick way of checking the fulfillment of Ecuadorian regulations for water use related to microbial pathogens. The models built and optimized with the WEKA package, were evaluated based on both statistical and ecological criteria to make them as clear and simple as possible. As a result, two different and reliable models were obtained, which could be used as proxy indicators in a preliminary assessment of pollution of microbial pathogens in rivers. The DTMs can be easily applied by staff with minimal training in the identification of the sensitive taxa selected by the models. The presence of selected macroinvertebrate taxa in conjunction with the decision trees can be used as a screening tool to evaluate sites that require additional follow up analyses to confirm whether microbial water quality standards are met
- …