Search CORE

67,278 research outputs found

Recommended from our members

Knowledge aquisition for expert systems: inducing modular rules from examples

Author: Cendrowska Jadzia
Publication venue
Publication date: 01/01/1990
Field of study

Knowledge acquisition for expert systems is notoriously difficult, often demanding an enormous effort on the part of the domain expert, who is essentially expected to spell out everything he knows about the domain. The task is non-trivial and can be time-consuming and tedious. Machine learning research, particularly into automatic rule induction from examples, may provide a way of easing this burden. Arguably, the most popular and successful rule induction algorithm in general use today is Quinlan's ID3. ID3 induces rules in the form of decision trees. However, the research reported in this thesis identifies some major limitations of a decision tree representation. Decision trees can be incomprehensible, but more importantly, there are rules which cannot be represented by trees. Ideally, induced rules should be modular and should capture the essence of causality, avoiding irrelevance and redundancy. The information theoretic approach employed in ID3 is examined in detail and some of its weaknesses identified. A new algorithm is developed which, by avoiding these weaknesses, induces rules which are modular rather than decision trees. This algorithm forms the basis of a new rule induction program, PRISM. Given an ideal training set, PRISM induces a complete and correct set of maximally general rules. The program and its results are described using training sets from two domains, contact lens fitting and a chess endgame. Induction from incomplete training sets is discussed and the performance of PRISM is compared with that of ID3 with particular reference to predictive power. A series of experiments is described, in which PRISM and ID3 were applied to training sets of different sizes and predictive power calculated. The results show that PRISM generally performs better than ID3 in these two domains, inducing fewer, more general rules, which classify a similar number of instances correctly and significantly fewer incorrectly

Open Research Online (The Open University)

A System for Induction of Oblique Decision Trees

Author: Kasif S.
Murthy S. K.
Salzberg S.
Publication venue
Publication date: 01/01/1994
Field of study

This article describes a new system for induction of oblique decision trees. This system, OC1, combines deterministic hill-climbing with two forms of randomization to find a good oblique split (in the form of a hyperplane) at each node of a decision tree. Oblique decision tree methods are tuned especially for domains in which the attributes are numeric, although they can be adapted to symbolic or mixed symbolic/numeric attributes. We present extensive empirical studies, using both real and artificial data, that analyze OC1's ability to construct oblique trees that are smaller and more accurate than their axis-parallel counterparts. We also examine the benefits of randomization for the construction of oblique decision trees.Comment: See http://www.jair.org/ for an online appendix and other files accompanying this articl

arXiv.org e-Print Archive

CiteSeerX

Rule-based Machine Learning Methods for Functional Prediction

Author: Indurkhya N.
Weiss S. M.
Publication venue
Publication date: 01/01/1995
Field of study

We describe a machine learning method for predicting the value of a real-valued function, given the values of multiple input variables. The method induces solutions from samples in the form of ordered disjunctive normal form (DNF) decision rules. A central objective of the method and representation is the induction of compact, easily interpretable solutions. This rule-based decision model can be extended to search efficiently for similar cases prior to approximating function values. Experimental results on real-world data demonstrate that the new techniques are competitive with existing machine learning and statistical methods and can sometimes yield superior regression performance.Comment: See http://www.jair.org/ for any accompanying file

arXiv.org e-Print Archive

CiteSeerX

Unsupervised Discovery of Phonological Categories through Supervised Learning of Morphological Rules

Author: Berck Peter
Daelemans Walter
Gillis Steven
Publication venue
Publication date: 01/01/1996
Field of study

We describe a case study in the application of {\em symbolic machine learning} techniques for the discovery of linguistic rules and categories. A supervised rule induction algorithm is used to learn to predict the correct diminutive suffix given the phonological representation of Dutch nouns. The system produces rules which are comparable to rules proposed by linguists. Furthermore, in the process of learning this morphological task, the phonemes used are grouped into phonologically relevant categories. We discuss the relevance of our method for linguistics and language technology

arXiv.org e-Print Archive

CiteSeerX

Institutional Repository Universiteit Antwerpen

Tilburg University Repository

Cost-Sensitive Classification: Empirical Evaluation of a Hybrid Genetic Decision Tree Induction Algorithm

Author: Turney P. D.
Publication venue
Publication date: 01/01/1995
Field of study

This paper introduces ICET, a new algorithm for cost-sensitive classification. ICET uses a genetic algorithm to evolve a population of biases for a decision tree induction algorithm. The fitness function of the genetic algorithm is the average cost of classification when using the decision tree, including both the costs of tests (features, measurements) and the costs of classification errors. ICET is compared here with three other algorithms for cost-sensitive classification - EG2, CS-ID3, and IDX - and also with C4.5, which classifies without regard to cost. The five algorithms are evaluated empirically on five real-world medical datasets. Three sets of experiments are performed. The first set examines the baseline performance of the five algorithms on the five datasets and establishes that ICET performs significantly better than its competitors. The second set tests the robustness of ICET under a variety of conditions and shows that ICET maintains its advantage. The third set looks at ICET's search in bias space and discovers a way to improve the search.Comment: See http://www.jair.org/ for any accompanying file

arXiv.org e-Print Archive

CiteSeerX

NRC Publications Archive

CogPrints Cognitive Sciences Eprint Archive

Inducing safer oblique trees without costs

Author: Althoff K.
Bennett K.P.
Bennett K.P.
Berry M.
Blake C.
Bradford J.
Breiman L.
Cohen R.
Domingos P.
Elkan C.
Elomaa T.
Fan W.
Grefenstette J.
Knoll U.
Kolodner J.
Morrison D.
Norusis M.
Nunez M.
Pazzani M.
Provost F.J.
Provost F.J.
Quinlan J.R.
Quinlan J.R.
Sunil Vadera
Tan M.
Ting K.
Turney P.
Vadera S.
Publication venue: 'Wiley'
Publication date: 01/09/2005
Field of study

Decision tree induction has been widely studied and applied. In safety applications, such as determining whether a chemical process is safe or whether a person has a medical condition, the cost of misclassification in one of the classes is significantly higher than in the other class. Several authors have tackled this problem by developing cost-sensitive decision tree learning algorithms or have suggested ways of changing the distribution of training examples to bias the decision tree learning process so as to take account of costs. A prerequisite for applying such algorithms is the availability of costs of misclassification. Although this may be possible for some applications, obtaining reasonable estimates of costs of misclassification is not easy in the area of safety. This paper presents a new algorithm for applications where the cost of misclassifications cannot be quantified, although the cost of misclassification in one class is known to be significantly higher than in another class. The algorithm utilizes linear discriminant analysis to identify oblique relationships between continuous attributes and then carries out an appropriate modification to ensure that the resulting tree errs on the side of safety. The algorithm is evaluated with respect to one of the best known cost-sensitive algorithms (ICET), a well-known oblique decision tree algorithm (OC1) and an algorithm that utilizes robust linear programming

University of Salford Institutional Repository

Crossref