8,278 research outputs found
Preceding rule induction with instance reduction methods
A new prepruning technique for rule induction is presented which applies instance reduction before rule induction. An empirical evaluation records the predictive accuracy and size of rule-sets generated from 24 datasets from the UCI Machine Learning Repository. Three instance reduction algorithms (Edited Nearest Neighbour, AllKnn and DROP5) are compared. Each one is used to reduce the size of the training set, prior to inducing a set of rules using Clark and Boswell's modification of CN2. A hybrid instance reduction algorithm (comprised of AllKnn and DROP5) is also tested. For most of the datasets, pruning the training set using ENN, AllKnn or the hybrid significantly reduces the number of rules generated by CN2, without adversely affecting the predictive performance. The hybrid achieves the highest average predictive accuracy
A survey of cost-sensitive decision tree induction algorithms
The past decade has seen a significant interest on the problem of inducing decision trees that take account of costs of misclassification and costs of acquiring the features used for decision making. This survey identifies over 50 algorithms including approaches that are direct adaptations of accuracy based methods, use genetic algorithms, use anytime methods and utilize boosting and bagging. The survey brings together these different studies and novel approaches to cost-sensitive decision tree learning, provides a useful taxonomy, a historical timeline of how the field has developed and should provide a useful reference point for future research in this field
SkILL - a Stochastic Inductive Logic Learner
Probabilistic Inductive Logic Programming (PILP) is a rel- atively unexplored
area of Statistical Relational Learning which extends classic Inductive Logic
Programming (ILP). This work introduces SkILL, a Stochastic Inductive Logic
Learner, which takes probabilistic annotated data and produces First Order
Logic theories. Data in several domains such as medicine and bioinformatics
have an inherent degree of uncer- tainty, that can be used to produce models
closer to reality. SkILL can not only use this type of probabilistic data to
extract non-trivial knowl- edge from databases, but it also addresses
efficiency issues by introducing a novel, efficient and effective search
strategy to guide the search in PILP environments. The capabilities of SkILL
are demonstrated in three dif- ferent datasets: (i) a synthetic toy example
used to validate the system, (ii) a probabilistic adaptation of a well-known
biological metabolism ap- plication, and (iii) a real world medical dataset in
the breast cancer domain. Results show that SkILL can perform as well as a
deterministic ILP learner, while also being able to incorporate probabilistic
knowledge that would otherwise not be considered
- …