237,676 research outputs found
Ensemble of Example-Dependent Cost-Sensitive Decision Trees
Several real-world classification problems are example-dependent
cost-sensitive in nature, where the costs due to misclassification vary between
examples and not only within classes. However, standard classification methods
do not take these costs into account, and assume a constant cost of
misclassification errors. In previous works, some methods that take into
account the financial costs into the training of different algorithms have been
proposed, with the example-dependent cost-sensitive decision tree algorithm
being the one that gives the highest savings. In this paper we propose a new
framework of ensembles of example-dependent cost-sensitive decision-trees. The
framework consists in creating different example-dependent cost-sensitive
decision trees on random subsamples of the training set, and then combining
them using three different combination approaches. Moreover, we propose two new
cost-sensitive combination approaches; cost-sensitive weighted voting and
cost-sensitive stacking, the latter being based on the cost-sensitive logistic
regression method. Finally, using five different databases, from four
real-world applications: credit card fraud detection, churn modeling, credit
scoring and direct marketing, we evaluate the proposed method against
state-of-the-art example-dependent cost-sensitive techniques, namely,
cost-proportionate sampling, Bayes minimum risk and cost-sensitive decision
trees. The results show that the proposed algorithms have better results for
all databases, in the sense of higher savings.Comment: 13 pages, 6 figures, Submitted for possible publicatio
Missiological Education by Extension: A Case Study of the Course, “Foundations of the World Christian Movement”
This article is a case study of a simple online course created jointly by a mission agency and small university. It demonstrates how open source course management technology has been used to create a true sense of community among students from a variety of cultural backgrounds and locations. The content of the course was delivered through a blending of open source technology and self-directed adult learning facilitated through online dialog, which in turned engendered multi-cultural, respectful virtual community. Information is shared to enable others to obtain low-cost technological help for setting up other individual courses in Moodle
A survey of cost-sensitive decision tree induction algorithms
The past decade has seen a significant interest on the problem of inducing decision trees that take account of costs of misclassification and costs of acquiring the features used for decision making. This survey identifies over 50 algorithms including approaches that are direct adaptations of accuracy based methods, use genetic algorithms, use anytime methods and utilize boosting and bagging. The survey brings together these different studies and novel approaches to cost-sensitive decision tree learning, provides a useful taxonomy, a historical timeline of how the field has developed and should provide a useful reference point for future research in this field
Managing for Learning and Impact
Over the past three years, the King Baudouin Foundation has developed a more systematic approach for the evaluation of its projects, which FSG helped codify in the KBF Project Management Guide: 'Managing for Learning and Impact'. There is a growing interest of foundations in Europe to evaluate the intended impact of their projects and programs. Foundations invest in an impact-driven philanthropy and therefore develop specific strategies, activities and tools
Cost-sensitive classification based on Bregman divergences
The main object of this PhD. Thesis is the identification, characterization and
study of new loss functions to address the so-called cost-sensitive classification. Many
decision problems are intrinsically cost-sensitive. However, the dominating preference
for cost-insensitive methods in the machine learning literature is a natural consequence
of the fact that true costs in real applications are di fficult to evaluate.
Since, in general, uncovering the correct class of the data is less costly than any
decision error, designing low error decision systems is a reasonable (but suboptimal)
approach. For instance, consider the classification of credit applicants as either being good customers (will pay back the credit) or bad customers (will fail to pay o part of the credit). The cost of classifying one risky borrower as good could be much higher than the cost of classifying a potentially good customer as bad.
Our proposal relies on Bayes decision theory where the goal is to assign instances
to the class with minimum expected cost. The decision is made involving both costs and posterior probabilities of the classes. Obtaining calibrated probability
estimates at the classifier output requires a suitable learning machine, a large enough
representative data set as well as an adequate loss function to be minimized during
learning. The design of the loss function can be aided by the costs: classical decision
theory shows that cost matrices de ne class boundaries determined by posterior class
probability estimates. Strictly speaking, in order to make optimal decisions, accurate
probability estimates are only required near the decision boundaries. It is key to
point out that the election of the loss function becomes especially relevant when
the prior knowledge about the problem is limited or the available training examples
are somehow unsuitable. In those cases, different loss functions lead to dramatically
different posterior probabilities estimates. We focus our study on the set of Bregman
divergences. These divergences offer a rich family of proper losses that has recently
become very popular in the machine learning community [Nock and Nielsen, 2009,
Reid and Williamson, 2009a].
The first part of the Thesis deals with the development of a novel parametric family of multiclass Bregman divergences which captures the information in the cost
matrix, so that the loss function is adapted to each specific problem. Multiclass costsensitive learning is one of the main challenges in cost-sensitive learning and, through this parametric family, we provide a natural framework to successfully overcome
binary tasks. Following this idea, two lines are explored:
Cost-sensitive supervised classification: We derive several asymptotic results.
The first analysis guarantees that the proposed Bregman divergence has maximum sensitivity to changes at probability vectors near the decision regions. Further analysis shows that the optimization of this Bregman divergence becomes equivalent to minimizing the overall cost regret in non-separable problems, and to maximizing a margin in separable problems.
Cost-sensitive semi-supervised classification: When labeled data is
scarce but unlabeled data is widely available, semi-supervised learning is an
useful tool to make the most of the unlabeled data. We discuss an optimization
problem relying on the minimization of our parametric family of Bregman divergences, using both labeled and unlabeled data, based on what is called the Entropy Minimization principle. We propose the rst multiclass cost-sensitive semi-supervised algorithm, under the assumption that inter-class separation is stronger than intra-class separation.
The second part of the Thesis deals with the transformation of this parametric family of Bregman divergences into a sequence of Bregman divergences. Work along this line can be further divided into two additional areas:
Foundations of sequences of Bregman divergences: We generalize some
previous results about the design and characterization of Bregman divergences
that are suitable for learning and their relationship with convexity. In addition,
we aim to broaden the subset of Bregman divergences that are interesting for
cost-sensitive learning. Under very general conditions, we nd sequences of (cost-sensitive) Bregman divergences, whose minimization provides minimum (cost-sensitive) risk for non-separable problems and some type of maximum margin classifiers in separable cases.
Learning with example-dependent costs: A strong assumption is widespread through most cost-sensitive learning algorithms: misclassification costs are the same for all examples. In many cases this statement is not true.
We claim that using the example-dependent costs directly is more natural and will lead to the production of more accurate classifiers. For these reasons, we consider the extension of cost-sensitive sequences of Bregman losses to example-dependent cost scenarios to generate finely tuned posterior probability estimates
- …