49,065 research outputs found

    Granular computing based approach of rule learning for binary classification

    Get PDF
    Rule learning is one of the most popular types of machine-learning approaches, which typically follow two main strategies: ‘divide and conquer’ and ‘separate and conquer’. The former strategy is aimed at induction of rules in the form of a decision tree, whereas the latter one is aimed at direct induction of if–then rules. Due to the case that the divide and conquer strategy could result in the replicated sub-tree problem, which not only leads to overfitting but also increases the computational complexity in classifying unseen instances, researchers have thus been motivated to develop rule learning approaches through the separate and conquer strategy. In this paper, we focus on investigation of the Prism algorithm, since it is a representative one that follows the separate and conquer strategy, and is aimed at learning a set of rules for each class in the setting of granular computing, where each class (referred to as target class) is viewed as a granule. The Prism algorithm shows highly comparable performance to the most popular algorithms, such as ID3 and C4.5, which follow the divide and conquer strategy. However, due to the need to learn a rule set for each class, Prism usually produces very complex rule-based classifiers. In real applications, there are many problems that involve one target class only, so it is not necessary to learn a rule set for each class, i.e., only a set of rules for the target class needs to be learned and a default rule is used to indicate the case of non-target classes. To address the above issues of Prism, we propose a new version of the algorithm referred to as PrismSTC, where ‘STC’ stands for ‘single target class’. Our experimental results show that PrismSTC leads to production of simpler rule-based classifiers without loss of accuracy in comparison with Prism. PrismSTC also demonstrates sufficiently good performance comparing with C4.5

    Induction of classification rules by Gini-Index based rule generation

    Get PDF
    Rule learning is one of the most popular areas in machine learning research, because the outcome of learning is to produce a set of rules, which not only provides accurate predictions but also shows a transparent process of mapping inputs to outputs. In general, rule learning approaches can be divided into two main types, namely, `divide and conquer' and `separate and conquer'. The former type of rule learning is also known as Top-Down Induction of Decision Trees, which means to learn a set of rules represented in the form of a decision tree. This approach results in the production of a large number of complex rules (usually due to the replicated sub-tree problem), which lowers the computational efficiency in both the training and testing stages, and leads to the overfitting of training data. Due to this problem, researchers have been gradually motivated to develop `separate and conquer' rule learning approaches, also known as covering approaches, by learning a set of rules on a sequential basis. In particular, a rule is learned and the instances covered by this rule are deleted from the training set, such that the learning of the next rule is based on a smaller training set. In this paper, we propose a new algorithm, GIBRG, which employs Gini-Index to measure the quality of each rule being learned, in the context of `separate and conquer' rule learning. Our experiments show that the proposed algorithm outperforms both decision tree learning algorithms (C4.5, CART) and `separate and conquer' approaches (Prism). In addition, it also leads to a smaller number of rules and rule terms, thus being more computationally efficient and less prone to overfitting

    J-measure based pruning for advancing classification performance of information entropy based rule generation

    Get PDF
    Learning of classification rules is a popular approach of machine learning, which can be achieved through two strategies, namely divide-and-conquer and separate-and-conquer. The former is aimed at generating rules in the form of a decision tree, whereas the latter generates if-then rules directly from training data. From this point of view, the above two strategies are referred to as decision tree learning and rule learning, respectively. Both learning strategies can lead to production of complex rule based classifiers that overfit training data, which has motivated researchers to develop pruning algorithms towards reduction of overfitting. In this paper, we propose a J-measure based pruning algorithm, which is referred to as Jmean-pruning. The proposed pruning algorithm is used to advance the performance of the information entropy based rule generation method that follows the separate and conquer strategy. An experimental study is reported to show how Jmean-pruning can effectively help the above rule learning method avoid overfitting. The results show that the use of Jmean-pruning achieves to advance the performance of the rule learning method and the improved performance is very comparable or even considerably better than the one of C4.5

    J-measure based pruning for advancing classification performance of information entropy based rule generation

    Get PDF
    Learning of classification rules is a popular approach of machine learning, which can be achieved through two strategies, namely divide-and-conquer and separate-and-conquer. The former is aimed at generating rules in the form of a decision tree, whereas the latter generates if-then rules directly from training data. From this point of view, the above two strategies are referred to as decision tree learning and rule learning, respectively. Both learning strategies can lead to production of complex rule based classifiers that overfit training data, which has motivated researchers to develop pruning algorithms towards reduction of overfitting. In this paper, we propose a J-measure based pruning algorithm, which is referred to as Jmean-pruning. The proposed pruning algorithm is used to advance the performance of the information entropy based rule generation method that follows the separate and conquer strategy. An experimental study is reported to show how Jmean-pruning can effectively help the above rule learning method avoid overfitting. The results show that the use of Jmean-pruning achieves to advance the performance of the rule learning method and the improved performance is very comparable or even considerably better than the one of C4.5

    Heuristic target class selection for advancing performance of coverage-based rule learning

    Get PDF
    Rule learning is a popular branch of machine learning, which can provide accurate and interpretable classification results. In general, two main strategies of rule learning are referred to as 'divide and conquer' and 'separate and con-quer'. Decision tree generation that follows the former strategy has a serious drawback, which is known as the replicated sub-tree problem, resulting from the constraint that all branches of a decision tree must have one or more common attributes. The above problem is likely to result in high computational complexity and the risk of overfitting, which leads to the necessity to develop rule learning algorithms (e.g., Prism) that follow the separate and conquer strategy. The replicated sub-tree problem can be effectively solved using the Prism algorithm , but the trained models are still complex due to the need of training an independent rule set for each selected target class. In order to reduce the risk of overfitting and the model complexity, we propose in this paper a variant of the Prism algorithm referred to as PrismCTC. The experimental results show that the PrismCTC algorithm leads to advances in classification performance and reduction of model complexity, in comparison with the C4.5 and Prism algorithms

    Learning Interpretable Rules for Multi-label Classification

    Full text link
    Multi-label classification (MLC) is a supervised learning problem in which, contrary to standard multiclass classification, an instance can be associated with several class labels simultaneously. In this chapter, we advocate a rule-based approach to multi-label classification. Rule learning algorithms are often employed when one is not only interested in accurate predictions, but also requires an interpretable theory that can be understood, analyzed, and qualitatively evaluated by domain experts. Ideally, by revealing patterns and regularities contained in the data, a rule-based theory yields new insights in the application domain. Recently, several authors have started to investigate how rule-based models can be used for modeling multi-label data. Discussing this task in detail, we highlight some of the problems that make rule learning considerably more challenging for MLC than for conventional classification. While mainly focusing on our own previous work, we also provide a short overview of related work in this area.Comment: Preprint version. To appear in: Explainable and Interpretable Models in Computer Vision and Machine Learning. The Springer Series on Challenges in Machine Learning. Springer (2018). See http://www.ke.tu-darmstadt.de/bibtex/publications/show/3077 for further informatio
    corecore