222,435 research outputs found
Kernelized Cost-Sensitive Listwise Ranking
This thesis research aims to conduct a study on a cost-sensitive listwise approach to learning to rank.
Learning to Rank is an area of application in machine learning, typically supervised, to build ranking models for Information Retrieval systems. The training data consists of lists of items with some partial order specified induced by an ordinal score or a binary judgment (relevant/not relevant). The model purpose is to produce a permutation of the items in this list in a way which is close to the rankings in the training data. This technique has been
successfully applied to ranking, and several approaches have been proposed since then, including the listwise approach.
A cost-sensitive version of that is an adaptation of this framework which treats the documents within a list with different probabilities, i.e. attempt to impose weights for the documents with higher cost. We then take this algorithm to the next level by kernelizing the loss and exploring the optimization in different spaces.
Among the different existing likelihood algorithms, we choose ListMLE as primary focus of experimentation, since it has been shown to be the approach with the best empirical performance. The theoretical framework is given along with its mathematical properties.
Experimentation is done on the benchmark LETOR dataset. They contain queries and some characteristics of the retrieved documents and its human judgments on the relevance of the documents on the queries.
Based on that we will show how the Kernel Cost-Sensitive ListMLE performs compared to the baseline Plain Cost-Sensitive ListMLE, ListNet, and RankSVM and show different aspects of the proposed loss function within different families of kernels
Ensemble of Example-Dependent Cost-Sensitive Decision Trees
Several real-world classification problems are example-dependent
cost-sensitive in nature, where the costs due to misclassification vary between
examples and not only within classes. However, standard classification methods
do not take these costs into account, and assume a constant cost of
misclassification errors. In previous works, some methods that take into
account the financial costs into the training of different algorithms have been
proposed, with the example-dependent cost-sensitive decision tree algorithm
being the one that gives the highest savings. In this paper we propose a new
framework of ensembles of example-dependent cost-sensitive decision-trees. The
framework consists in creating different example-dependent cost-sensitive
decision trees on random subsamples of the training set, and then combining
them using three different combination approaches. Moreover, we propose two new
cost-sensitive combination approaches; cost-sensitive weighted voting and
cost-sensitive stacking, the latter being based on the cost-sensitive logistic
regression method. Finally, using five different databases, from four
real-world applications: credit card fraud detection, churn modeling, credit
scoring and direct marketing, we evaluate the proposed method against
state-of-the-art example-dependent cost-sensitive techniques, namely,
cost-proportionate sampling, Bayes minimum risk and cost-sensitive decision
trees. The results show that the proposed algorithms have better results for
all databases, in the sense of higher savings.Comment: 13 pages, 6 figures, Submitted for possible publicatio
- …