5,906 research outputs found

    A heuristic for learning decision trees and pruning them into classification rules

    Get PDF
    Let us consider a set of training examples described by continuous or symbolic attributes with categorical classes. In this paper we present a measure of the potential quality of a region of the attribute space to be represented as a rule condition to classify unseen cases. The aim is to take into account the distribution of the classes of the examples. The resulting measure, called impurity level, is inspired by a similar measure used in the instance-based algorithm IB3 for selecting suitable paradigmatic exemplars that will classify, in a nearest-neighbor context, future cases. The features of the impurity level are illustrated using a version of Quinlan's well-known C4.5 where the information-based heuristics are replaced by our measure. The experiments carried out to test the proposals indicate a very high accuracy reached with sets of classification rules as small as those found by RIPPE

    A machine learning approach with verification of predictions and assisted supervision for a rule-based network intrusion detection system

    Get PDF
    Network security is a branch of network management in which network intrusion detection systems provide attack detection features by monitorization of traffic data. Rule-based misuse detection systems use a set of rules or signatures to detect attacks that exploit a particular vulnerability. These rules have to be handcoded by experts to properly identify vulnerabilities, which results in misuse detection systems having limited extensibility. This paper proposes a machine learning layer on top of a rule-based misuse detection system that provides automatic generation of detection rules, prediction verification and assisted classification of new data. Our system offers an overall good performance, while adding an heuristic and adaptive approach to existing rule-based misuse detection systems

    Learning Multi-Tree Classification Models with Ant Colony Optimization

    Get PDF
    Ant Colony Optimization (ACO) is a meta-heuristic for solving combinatorial optimization problems, inspired by the behaviour of biological ant colonies. One of the successful applications of ACO is learning classification models (classifiers). A classifier encodes the relationships between the input attribute values and the values of a class attribute in a given set of labelled cases and it can be used to predict the class value of new unlabelled cases. Decision trees have been widely used as a type of classification model that represent comprehensible knowledge to the user. In this paper, we propose the use of ACO-based algorithms for learning an extended multi-tree classification model, which consists of multiple decision trees, one for each class value. Each class-based decision trees is responsible for discriminating between its class value and all other values available in the class domain. Our proposed algorithms are empirically evaluated against well-known decision trees induction algorithms, as well as the ACO-based Ant-Tree-Miner algorithm. The results show an overall improvement in predictive accuracy over 32 benchmark datasets. We also discuss how the new multi-tree models can provide the user with more understanding and knowledge-interpretability in a given domain

    Random Prism: An Alternative to Random Forests.

    Get PDF
    Ensemble learning techniques generate multiple classifiers, so called base classifiers, whose combined classification results are used in order to increase the overall classification accuracy. In most ensemble classifiers the base classifiers are based on the Top Down Induction of Decision Trees (TDIDT) approach. However, an alternative approach for the induction of rule based classifiers is the Prism family of algorithms. Prism algorithms produce modular classification rules that do not necessarily fit into a decision tree structure. Prism classification rulesets achieve a comparable and sometimes higher classification accuracy compared with decision tree classifiers, if the data is noisy and large. Yet Prism still suffers from overfitting on noisy and large datasets. In practice ensemble techniques tend to reduce the overfitting, however there exists no ensemble learner for modular classification rule inducers such as the Prism family of algorithms. This article describes the first development of an ensemble learner based on the Prism family of algorithms in order to enhance Prism’s classification accuracy by reducing overfitting

    Investigating Evaluation Measures in Ant Colony Algorithms for Learning Decision Tree Classifiers

    Get PDF
    Ant-Tree-Miner is a decision tree induction algorithm that is based on the Ant Colony Optimization (ACO) meta- heuristic. Ant-Tree-Miner-M is a recently introduced extension of Ant-Tree-Miner that learns multi-tree classification models. A multi-tree model consists of multiple decision trees, one for each class value, where each class-based decision tree is responsible for discriminating between its class value and all other values present in the class domain (one vs. all). In this paper, we investigate the use of 10 different classification quality evaluation measures in Ant-Tree-Miner-M, which are used for both candidate model evaluation and model pruning. Our experimental results, using 40 popular benchmark datasets, identify several quality functions that substantially improve on the simple Accuracy quality function that was previously used in Ant-Tree-Miner-M

    A review of associative classification mining

    Get PDF
    Associative classification mining is a promising approach in data mining that utilizes the association rule discovery techniques to construct classification systems, also known as associative classifiers. In the last few years, a number of associative classification algorithms have been proposed, i.e. CPAR, CMAR, MCAR, MMAC and others. These algorithms employ several different rule discovery, rule ranking, rule pruning, rule prediction and rule evaluation methods. This paper focuses on surveying and comparing the state-of-the-art associative classification techniques with regards to the above criteria. Finally, future directions in associative classification, such as incremental learning and mining low-quality data sets, are also highlighted in this paper
    corecore