64 research outputs found

    Fuzzy set covering as a new paradigm for the induction of fuzzy classification rules

    Full text link
    In 1965 Lofti A. Zadeh proposed fuzzy sets as a generalization of crisp (or classic) sets to address the incapability of crisp sets to model uncertainty and vagueness inherent in the real world. Initially, fuzzy sets did not receive a very warm welcome as many academics stood skeptical towards a theory of imprecise'' mathematics. In the middle to late 1980's the success of fuzzy controllers brought fuzzy sets into the limelight, and many applications using fuzzy sets started appearing. In the early 1970's the first machine learning algorithms started appearing. The AQ family of algorithms pioneered by Ryszard S. Michalski is a good example of the family of set covering algorithms. This class of learning algorithm induces concept descriptions by a greedy construction of rules that describe (or cover) positive training examples but not negative training examples. The learning process is iterative, and in each iteration one rule is induced and the positive examples covered by the rule removed from the set of positive training examples. Because positive instances are separated from negative instances, the term separate-and-conquer has been used to contrast the learning strategy against decision tree induction that use a divide-and-conquer learning strategy. This dissertation proposes fuzzy set covering as a powerful rule induction strategy. We survey existing fuzzy learning algorithms, and conclude that very few fuzzy learning algorithms follow a greedy rule construction strategy and no publications to date made the link between fuzzy sets and set covering explicit. We first develop the theoretical aspects of fuzzy set covering, and then apply these in proposing the first fuzzy learning algorithm that apply set covering and make explicit use of a partial order for fuzzy classification rule induction. We also investigate several strategies to improve upon the basic algorithm, such as better search heuristics and different rule evaluation metrics. We then continue by proposing a general unifying framework for fuzzy set covering algorithms. We demonstrate the benefits of the framework and propose several further fuzzy set covering algorithms that fit within the framework. We compare fuzzy and crisp rule induction, and provide arguments in favour of fuzzy set covering as a rule induction strategy. We also show that our learning algorithms outperform other fuzzy rule learners on real world data. We further explore the idea of simultaneous concept learning in the fuzzy case, and continue to propose the first fuzzy decision list induction algorithm. Finally, we propose a first strategy for encoding the rule sets generated by our fuzzy set covering algorithms inside an equivalent neural network

    Fuzzy set covering as a new paradigm for the induction of fuzzy classification rules

    Get PDF
    In 1965 Lofti A. Zadeh proposed fuzzy sets as a generalization of crisp (or classic) sets to address the incapability of crisp sets to model uncertainty and vagueness inherent in the real world. Initially, fuzzy sets did not receive a very warm welcome as many academics stood skeptical towards a theory of imprecise'' mathematics. In the middle to late 1980's the success of fuzzy controllers brought fuzzy sets into the limelight, and many applications using fuzzy sets started appearing. In the early 1970's the first machine learning algorithms started appearing. The AQ family of algorithms pioneered by Ryszard S. Michalski is a good example of the family of set covering algorithms. This class of learning algorithm induces concept descriptions by a greedy construction of rules that describe (or cover) positive training examples but not negative training examples. The learning process is iterative, and in each iteration one rule is induced and the positive examples covered by the rule removed from the set of positive training examples. Because positive instances are separated from negative instances, the term separate-and-conquer has been used to contrast the learning strategy against decision tree induction that use a divide-and-conquer learning strategy. This dissertation proposes fuzzy set covering as a powerful rule induction strategy. We survey existing fuzzy learning algorithms, and conclude that very few fuzzy learning algorithms follow a greedy rule construction strategy and no publications to date made the link between fuzzy sets and set covering explicit. We first develop the theoretical aspects of fuzzy set covering, and then apply these in proposing the first fuzzy learning algorithm that apply set covering and make explicit use of a partial order for fuzzy classification rule induction. We also investigate several strategies to improve upon the basic algorithm, such as better search heuristics and different rule evaluation metrics. We then continue by proposing a general unifying framework for fuzzy set covering algorithms. We demonstrate the benefits of the framework and propose several further fuzzy set covering algorithms that fit within the framework. We compare fuzzy and crisp rule induction, and provide arguments in favour of fuzzy set covering as a rule induction strategy. We also show that our learning algorithms outperform other fuzzy rule learners on real world data. We further explore the idea of simultaneous concept learning in the fuzzy case, and continue to propose the first fuzzy decision list induction algorithm. Finally, we propose a first strategy for encoding the rule sets generated by our fuzzy set covering algorithms inside an equivalent neural network

    Efficient learning of large sets of locally optimal classification rules

    Full text link
    Conventional rule learning algorithms aim at finding a set of simple rules, where each rule covers as many examples as possible. In this paper, we argue that the rules found in this way may not be the optimal explanations for each of the examples they cover. Instead, we propose an efficient algorithm that aims at finding the best rule covering each training example in a greedy optimization consisting of one specialization and one generalization loop. These locally optimal rules are collected and then filtered for a final rule set, which is much larger than the sets learned by conventional rule learning algorithms. A new example is classified by selecting the best among the rules that cover this example. In our experiments on small to very large datasets, the approach's average classification accuracy is higher than that of state-of-the-art rule learning algorithms. Moreover, the algorithm is highly efficient and can inherently be processed in parallel without affecting the learned rule set and so the classification accuracy. We thus believe that it closes an important gap for large-scale classification rule induction.Comment: article, 40 pages, Machine Learning journal (2023

    Knowledge discovering for document classification using tree matching in Texpros

    Get PDF
    This dissertation describes a knowledge-based system for classifying documents based upon the layout structure and conceptual information extracted from the content of the document. The spatial elements in a document are laid out in rectangular blocks which are represented by nodes in an ordered labelled tree, called the layout structure tree (L-S Tree). Each leaf node of a L-S Tree points to its corresponding block content. A knowledge Acquisition Tool (KAT) is devised to create a Document Sample Tree from L-S Tree, in which each of its leaves contains a node content conceptually describing its corresponding block content. Then, applying generalization rules, the KAT performs the inductive learning from Document Sample Trees of a type and generates fewer number of Document Type Trees to represent its type. A testing document is classified if a Document Type Tree is discovered as a substructure of the L-S Tree of the testing document; and then the exact format of the testing document can be found by matching the L-S Tree with the Document Sample Trees of the classified document type. The Document Sample Trees and Document Type Trees are called Structural Knowledge Base (SKB). The tree discovering and matching processes involve computing the edit distance and the degree of conceptual closeness between the SKB trees and the L-S Tree of a testing document by using pattern matching and discovering toolkits. Our experimental results demonstrate that many office documents can be classified correctly using the proposed approach

    Reasoning about fuzzy temporal and spatial information from the Web

    Get PDF
    • …
    corecore