28,910 research outputs found
A survey of cost-sensitive decision tree induction algorithms
The past decade has seen a significant interest on the problem of inducing decision trees that take account of costs of misclassification and costs of acquiring the features used for decision making. This survey identifies over 50 algorithms including approaches that are direct adaptations of accuracy based methods, use genetic algorithms, use anytime methods and utilize boosting and bagging. The survey brings together these different studies and novel approaches to cost-sensitive decision tree learning, provides a useful taxonomy, a historical timeline of how the field has developed and should provide a useful reference point for future research in this field
CSNL: A cost-sensitive non-linear decision tree algorithm
This article presents a new decision tree learning algorithm called CSNL that induces Cost-Sensitive Non-Linear decision trees. The algorithm is based on the hypothesis that nonlinear decision nodes provide a better basis than axis-parallel decision nodes and utilizes discriminant analysis to construct nonlinear decision trees that take account of costs of misclassification.
The performance of the algorithm is evaluated by applying it to seventeen datasets and the results are compared with those obtained by two well known cost-sensitive algorithms, ICET and MetaCost, which generate multiple trees to obtain some of the best results to date. The results show that CSNL performs at least as well, if not better than these algorithms, in more than twelve of the datasets and is considerably faster. The use of bagging with CSNL further enhances its performance showing the significant benefits of using nonlinear decision nodes.
The performance of the algorithm is evaluated by applying it to seventeen data sets and the results are
compared with those obtained by two well known cost-sensitive algorithms, ICET and MetaCost, which generate multiple trees to obtain some of the best results to date.
The results show that CSNL performs at least as well, if not better than these algorithms, in more than twelve of the data sets and is considerably faster.
The use of bagging with CSNL further enhances its performance showing the significant benefits of using non-linear decision nodes
One-Class Classification: Taxonomy of Study and Review of Techniques
One-class classification (OCC) algorithms aim to build classification models
when the negative class is either absent, poorly sampled or not well defined.
This unique situation constrains the learning of efficient classifiers by
defining class boundary just with the knowledge of positive class. The OCC
problem has been considered and applied under many research themes, such as
outlier/novelty detection and concept learning. In this paper we present a
unified view of the general problem of OCC by presenting a taxonomy of study
for OCC problems, which is based on the availability of training data,
algorithms used and the application domains applied. We further delve into each
of the categories of the proposed taxonomy and present a comprehensive
literature review of the OCC algorithms, techniques and methodologies with a
focus on their significance, limitations and applications. We conclude our
paper by discussing some open research problems in the field of OCC and present
our vision for future research.Comment: 24 pages + 11 pages of references, 8 figure
Visual detection of blemishes in potatoes using minimalist boosted classifiers
This paper introduces novel methods for detecting blemishes in potatoes using machine vision. After segmentation of the potato from the background, a pixel-wise classifier is trained to detect blemishes using features extracted from the image.
A very large set of candidate features, based on statistical information relating to the colour and texture of the region surrounding a given pixel, is first extracted.
Then an adaptive boosting algorithm (AdaBoost) is used to automatically select the best features for discriminating between blemishes and non-blemishes.
With this approach, different features can be selected for different potato varieties, while also handling the natural variation in fresh produce due to different seasons, lighting conditions, etc.
The results show that the method is able to build ``minimalist'' classifiers that optimise detection performance at low computational cost.
In experiments, blemish detectors were trained for both white and red potato varieties, achieving 89.6\% and 89.5\% accuracy, respectively
Machine Learning in Automated Text Categorization
The automated categorization (or classification) of texts into predefined
categories has witnessed a booming interest in the last ten years, due to the
increased availability of documents in digital form and the ensuing need to
organize them. In the research community the dominant approach to this problem
is based on machine learning techniques: a general inductive process
automatically builds a classifier by learning, from a set of preclassified
documents, the characteristics of the categories. The advantages of this
approach over the knowledge engineering approach (consisting in the manual
definition of a classifier by domain experts) are a very good effectiveness,
considerable savings in terms of expert manpower, and straightforward
portability to different domains. This survey discusses the main approaches to
text categorization that fall within the machine learning paradigm. We will
discuss in detail issues pertaining to three different problems, namely
document representation, classifier construction, and classifier evaluation.Comment: Accepted for publication on ACM Computing Survey
- …