1,395 research outputs found

    J-measure based pruning for advancing classification performance of information entropy based rule generation

    Get PDF
    Learning of classification rules is a popular approach of machine learning, which can be achieved through two strategies, namely divide-and-conquer and separate-and-conquer. The former is aimed at generating rules in the form of a decision tree, whereas the latter generates if-then rules directly from training data. From this point of view, the above two strategies are referred to as decision tree learning and rule learning, respectively. Both learning strategies can lead to production of complex rule based classifiers that overfit training data, which has motivated researchers to develop pruning algorithms towards reduction of overfitting. In this paper, we propose a J-measure based pruning algorithm, which is referred to as Jmean-pruning. The proposed pruning algorithm is used to advance the performance of the information entropy based rule generation method that follows the separate and conquer strategy. An experimental study is reported to show how Jmean-pruning can effectively help the above rule learning method avoid overfitting. The results show that the use of Jmean-pruning achieves to advance the performance of the rule learning method and the improved performance is very comparable or even considerably better than the one of C4.5

    J-measure based pruning for advancing classification performance of information entropy based rule generation

    Get PDF
    Learning of classification rules is a popular approach of machine learning, which can be achieved through two strategies, namely divide-and-conquer and separate-and-conquer. The former is aimed at generating rules in the form of a decision tree, whereas the latter generates if-then rules directly from training data. From this point of view, the above two strategies are referred to as decision tree learning and rule learning, respectively. Both learning strategies can lead to production of complex rule based classifiers that overfit training data, which has motivated researchers to develop pruning algorithms towards reduction of overfitting. In this paper, we propose a J-measure based pruning algorithm, which is referred to as Jmean-pruning. The proposed pruning algorithm is used to advance the performance of the information entropy based rule generation method that follows the separate and conquer strategy. An experimental study is reported to show how Jmean-pruning can effectively help the above rule learning method avoid overfitting. The results show that the use of Jmean-pruning achieves to advance the performance of the rule learning method and the improved performance is very comparable or even considerably better than the one of C4.5

    Induction of classification rules by Gini-Index based rule generation

    Get PDF
    Rule learning is one of the most popular areas in machine learning research, because the outcome of learning is to produce a set of rules, which not only provides accurate predictions but also shows a transparent process of mapping inputs to outputs. In general, rule learning approaches can be divided into two main types, namely, `divide and conquer' and `separate and conquer'. The former type of rule learning is also known as Top-Down Induction of Decision Trees, which means to learn a set of rules represented in the form of a decision tree. This approach results in the production of a large number of complex rules (usually due to the replicated sub-tree problem), which lowers the computational efficiency in both the training and testing stages, and leads to the overfitting of training data. Due to this problem, researchers have been gradually motivated to develop `separate and conquer' rule learning approaches, also known as covering approaches, by learning a set of rules on a sequential basis. In particular, a rule is learned and the instances covered by this rule are deleted from the training set, such that the learning of the next rule is based on a smaller training set. In this paper, we propose a new algorithm, GIBRG, which employs Gini-Index to measure the quality of each rule being learned, in the context of `separate and conquer' rule learning. Our experiments show that the proposed algorithm outperforms both decision tree learning algorithms (C4.5, CART) and `separate and conquer' approaches (Prism). In addition, it also leads to a smaller number of rules and rule terms, thus being more computationally efficient and less prone to overfitting

    Multi-stage mixed rule learning approach for advancing performance of rule-based classification

    Get PDF
    Rule learning is a special type of machine learning approaches, and its key advantage is the generation of interpretable models, which provides a transparent process of showing how an input is mapped to an output. Traditional rule learning algorithms are typically based on Boolean logic for inducing rule antecedents, which are very effective for training models on data sets that involve discrete attributes only. When continuous attributes are present in a data set, traditional rule learning approaches need to employ crisp intervals. However, in reality, problems usually show shades of grey, which motivated the development of fuzzy rule learning approaches by employing fuzzy intervals for handling continuous attributes. While a data set contains a large portion of discrete attributes or even no continuous attributes, fuzzy approaches cannot be used to learn rules effectively, leading to a drop in the performance. In this paper, a multi-stage approach of mixed rule learning is proposed, which involves strategic combination of both traditional and fuzzy approaches to handle effectively various types of attributes. We compare our proposed approach with existing algorithms of rule learning. Our experimental results show that our proposed approach leads to significant advances in the performance compared with the existing algorithms

    Heuristic target class selection for advancing performance of coverage-based rule learning

    Get PDF
    Rule learning is a popular branch of machine learning, which can provide accurate and interpretable classification results. In general, two main strategies of rule learning are referred to as 'divide and conquer' and 'separate and con-quer'. Decision tree generation that follows the former strategy has a serious drawback, which is known as the replicated sub-tree problem, resulting from the constraint that all branches of a decision tree must have one or more common attributes. The above problem is likely to result in high computational complexity and the risk of overfitting, which leads to the necessity to develop rule learning algorithms (e.g., Prism) that follow the separate and conquer strategy. The replicated sub-tree problem can be effectively solved using the Prism algorithm , but the trained models are still complex due to the need of training an independent rule set for each selected target class. In order to reduce the risk of overfitting and the model complexity, we propose in this paper a variant of the Prism algorithm referred to as PrismCTC. The experimental results show that the PrismCTC algorithm leads to advances in classification performance and reduction of model complexity, in comparison with the C4.5 and Prism algorithms

    Botnet detection using ensemble classifiers of network flow

    Get PDF
    Recently, Botnets have become a common tool for implementing and transferring various malicious codes over the Internet. These codes can be used to execute many malicious activities including DDOS attack, send spam, click fraud, and steal data. Therefore, it is necessary to use Modern technologies to reduce this phenomenon and avoid them in advance in order to differentiate the Botnets traffic from normal network traffic. In this work, ensemble classifier algorithms to identify such damaging botnet traffic. We experimented with different ensemble algorithms to compare and analyze their ability to classify the botnet traffic from the normal traffic by selecting distinguishing features of the network traffic. Botnet Detection offers a reliable and cheap style for ensuring transferring integrity and warning the risks before its occurrence
    • …
    corecore