Knowledge acquisition for expert systems is notoriously difficult, often demanding an enormous effort on the part of the domain expert, who is essentially expected to spell out everything he knows about the domain. The task is non-trivial and can be time-consuming and tedious. Machine learning research, particularly into automatic rule induction from examples, may provide a way of easing this burden.
Arguably, the most popular and successful rule induction algorithm in general use today is Quinlan's ID3. ID3 induces rules in the form of decision trees. However, the research reported in this thesis identifies some major limitations of a decision tree representation. Decision trees can be incomprehensible, but more importantly, there are rules which cannot be represented by trees. Ideally, induced rules should be modular and should capture the essence of causality, avoiding irrelevance and redundancy.
The information theoretic approach employed in ID3 is examined in detail and some of its weaknesses identified. A new algorithm is developed which, by avoiding these weaknesses, induces rules which are modular rather than decision trees. This algorithm forms the basis of a new rule induction program, PRISM.
Given an ideal training set, PRISM induces a complete and correct set of maximally general rules. The program and its results are described using training sets from two domains, contact lens fitting and a chess endgame. Induction from incomplete training sets is discussed and the performance of PRISM is compared with that of ID3 with particular reference to predictive power.
A series of experiments is described, in which PRISM and ID3 were applied to training sets of different sizes and predictive power calculated. The results show that PRISM generally performs better than ID3 in these two domains, inducing fewer, more general rules, which classify a similar number of instances correctly and significantly fewer incorrectly