127 research outputs found

    Adaptive Parameter Control Strategy for Ant-Miner Classification Algorithm

    Get PDF
    Pruning is the popular framework for preventing the dilemma of overfitting noisy data. This paper presents a new hybrid Ant-Miner classification algorithm and ant colony system (ACS), called ACS-AntMiner. A key aspect of this algorithm is the selection of an appropriate number of terms to be included in the classification rule. ACS-AntMiner introduces a new parameter called importance rate (IR) which is a pre-pruning criterion based on the probability (heuristic and pheromone) amount. This criterion is responsible for adding only the important terms to each rule, thus discarding noisy data. The ACS algorithm is designed to optimize the IR parameter during the learning process of the Ant-Miner algorithm. The performance of the proposed classifier is compared with related ant-mining classifiers, namely, Ant-Miner, CAnt-Miner, TACO-Miner, and Ant-Miner with a hybrid pruner across several datasets. Experimental results show that the proposed classifier significantly outperforms the other ant-mining classifiers

    New Archive-Based Ant Colony Optimization Algorithms for Learning Predictive Rules from Data

    Get PDF
    Data mining is the process of extracting knowledge and patterns from data. Classification and Regression are among the major data mining tasks, where the goal is to predict a value of an attribute of interest for each data instance, given the values of a set of predictive attributes. Most classification and regression problems involve continuous, ordinal and categorical attributes. Currently Ant Colony Optimization (ACO) algorithms have focused on directly handling categorical attributes only; continuous attributes are transformed using a discretisation procedure in either a preprocessing stage or dynamically during the rule creation. The use of a discretisation procedure has several limitations: (i) it increases the computational runtime, since several candidates values need to evaluated; (ii) requires access to the entire attribute domain, which in some applications all data is not available; (iii) the values used to create discrete intervals are not optimised in combination with the values of other attributes. This thesis investigates the use of solution archive pheromone model, based on Ant Colony Optimization for mixed-variable (ACOMV) algorithm, to directly cope with all attribute types. Firstly, an archive-based ACO classification algorithm is presented, followed by an automatic design framework to generate new configuration of ACO algorithms. Then, we addressed the challenging problem of mining data streams, presenting a new ACO algorithm in combination with a hybrid pheromone model. Finally, the archive-based approach is extended to cope with regression problems. All algorithms presented are compared against well-known algorithms from the literature using publicly available data sets. Our results have been shown to improve the computational time while maintaining a competitive predictive performance

    Rule pruning techniques in the ant-miner classification algorithm and its variants: A review

    Get PDF
    Rule-based classification is considered an important task of data classification.The ant-mining rule-based classification algorithm, inspired from the ant colony optimization algorithm, shows a comparable performance and outperforms in some application domains to the existing methods in the literature.One problem that often arises in any rule-based classification is the overfitting problem. Rule pruning is a framework to avoid overfitting.Furthermore, we find that the influence of rule pruning in ant-miner classification algorithms is equivalent to that of local search in stochastic methods when they aim to search for more improvement for each candidate solution.In this paper, we review the history of the pruning techniques in ant-miner and its variants.These techniques are classified into post-pruning, pre-pruning and hybrid-pruning.In addition, we compare and analyse the advantages and disadvantages of these methods. Finally, future research direction to find new hybrid rule pruning techniques are provided

    Adaptive parameter control strategy for ant-miner classification algorithm

    Get PDF
    Pruning is the popular framework for preventing the dilemma of over fitting noisy data. This paper presents a new hybrid Ant-Miner classification algorithm and ant colony system (ACS), called ACS-Ant Miner. A key aspect of this algorithm is the selection of an appropriate number of terms to be included in the classification rule. ACS-AntMiner introduces a new parameter called importance rate (IR) which is a pre-pruning criterion based on the probability (heuristic and pheromone) amount. This criterion is responsible for adding only the important terms to each rule, thus discarding noisy data. The ACS algorithm is designed to optimize the IR parameter during the learning process of the Ant-Miner algorithm. The performance of the proposed classifier is compared with related ant-mining classifiers, namely, Ant-Miner, CAnt-Miner, TACO-Miner, and Ant-Miner with a hybrid pruner across several datasets. Experimental results show that the proposed classifier significantly outperforms the other ant-mining classifiers

    Aco-based feature selection algorithm for classification

    Get PDF
    Dataset with a small number of records but big number of attributes represents a phenomenon called “curse of dimensionality”. The classification of this type of dataset requires Feature Selection (FS) methods for the extraction of useful information. The modified graph clustering ant colony optimisation (MGCACO) algorithm is an effective FS method that was developed based on grouping the highly correlated features. However, the MGCACO algorithm has three main drawbacks in producing a features subset because of its clustering method, parameter sensitivity, and the final subset determination. An enhanced graph clustering ant colony optimisation (EGCACO) algorithm is proposed to solve the three (3) MGCACO algorithm problems. The proposed improvement includes: (i) an ACO feature clustering method to obtain clusters of highly correlated features; (ii) an adaptive selection technique for subset construction from the clusters of features; and (iii) a genetic-based method for producing the final subset of features. The ACO feature clustering method utilises the ability of various mechanisms such as intensification and diversification for local and global optimisation to provide highly correlated features. The adaptive technique for ant selection enables the parameter to adaptively change based on the feedback of the search space. The genetic method determines the final subset, automatically, based on the crossover and subset quality calculation. The performance of the proposed algorithm was evaluated on 18 benchmark datasets from the University California Irvine (UCI) repository and nine (9) deoxyribonucleic acid (DNA) microarray datasets against 15 benchmark metaheuristic algorithms. The experimental results of the EGCACO algorithm on the UCI dataset are superior to other benchmark optimisation algorithms in terms of the number of selected features for 16 out of the 18 UCI datasets (88.89%) and the best in eight (8) (44.47%) of the datasets for classification accuracy. Further, experiments on the nine (9) DNA microarray datasets showed that the EGCACO algorithm is superior than the benchmark algorithms in terms of classification accuracy (first rank) for seven (7) datasets (77.78%) and demonstrates the lowest number of selected features in six (6) datasets (66.67%). The proposed EGCACO algorithm can be utilised for FS in DNA microarray classification tasks that involve large dataset size in various application domains

    An adaptive ant colony optimization algorithm for rule-based classification

    Get PDF
    Classification is an important data mining task with different applications in many fields. Various classification algorithms have been developed to produce classification models with high accuracy. Differing from other complex and difficult classification models, rules-based classification algorithms produce models which are understandable for users. Ant-Miner is a variant of ant colony optimisation and a prominent intelligent algorithm widely use in rules-based classification. However, the Ant-Miner has overfitting and easily falls into local optima problems which resulted in low classification accuracy and complex classification rules. In this study, a new Ant-Miner classifier is developed, named Adaptive Genetic Iterated-AntMiner (AGI-AntMiner) that aims to avoid local optima and overfitting problems. The components of AGI-AntMiner includes: i) an Adaptive AntMiner which is a prepruning technique to dynamically select the appropriate threshold based on the quality of the rules; ii) Genetic AntMiner that improves the post-pruning by adding/removing terms in a dual manner; and, iii) an Iterated Local Search-AntMiner that improves exploitation based on multiple-neighbourhood structure. The proposed AGI-AntMiner algorithm is evaluated on 16 benchmark datasets of medical, financial, gaming and social domains obtained from the University California Irvine repository. The algorithm’s performance was compared with other variants of Ant-Miner and state-of-the-art rules-based classification algorithms based on classification accuracy and model complexity. Experimental results proved that the proposed AGI-AntMiner algorithm is superior in two (2) aspects. Hybridization of local search in AGI-AntMiner has improved the exploitation mechanism which leads to the discovery of more accurate classification rules. The new pre-pruning and postpruning techniques have improved the pruning ability to produce shorter classification rules which are easier to interpret by the users. Thus, the proposed AGI-AntMiner algorithm is capable in conducting an efficient search in finding the best classification rules that balance the classification accuracy and model complexity to overcome overfitting and local optima problems

    Hybrid nature-inspired computation methods for optimization

    Get PDF
    The focus of this work is on the exploration of the hybrid Nature-Inspired Computation (NIC) methods with application in optimization. In the dissertation, we first study various types of the NIC algorithms including the Clonal Selection Algorithm (CSA), Particle Swarm Optimization (PSO), Ant Colony Optimization (ACO), Simulated Annealing (SA), Harmony Search (HS), Differential Evolution (DE), and Mind Evolution Computing (MEC), and propose several new fusions of the NIC techniques, such as CSA-DE, HS-DE, and CSA-SA. Their working principles, structures, and algorithms are analyzed and discussed in details. We next investigate the performances of our hybrid NIC methods in handling nonlinear, multi-modal, and dynamical optimization problems, e.g., nonlinear function optimization, optimal LC passive power filter design, and optimization of neural networks and fuzzy classification systems. The hybridization of these NIC methods can overcome the shortcomings of standalone algorithms while still retaining all the advantages. It has been demonstrated using computer simulations that the proposed hybrid NIC approaches are capable of yielding superior optimization performances over the individual NIC methods as well as conventional methodologies with regard to the search efficiency, convergence speed, and quantity and quality of the optimal solutions achieved

    Enhanced grey wolf optimisation algorithm for feature selection in anomaly detection

    Get PDF
    Anomaly detection deals with identification of items that do not conform to an expected pattern or items present in a dataset. The performance of different mechanisms utilized to perform the anomaly detection depends heavily on the group of features used. Thus, not all features in the dataset can be used in the classification process since some features may lead to low performance of classifier. Feature selection (FS) is a good mechanism that minimises the dimension of high-dimensional datasets by deleting the irrelevant features. Modified Binary Grey Wolf Optimiser (MBGWO) is a modern metaheuristic algorithm that has successfully been used for FS for anomaly detection. However, the MBGWO has several issues in finding a good quality solution. Thus, this study proposes an enhanced binary grey wolf optimiser (EBGWO) algorithm for FS in anomaly detection to overcome the algorithm issues. The first modification enhances the initial population of the MBGWO using a heuristic based Ant Colony Optimisation algorithm. The second modification develops a new position update mechanism using the Bat Algorithm movement. The third modification improves the controlled parameter of the MBGWO algorithm using indicators from the search process to refine the solution. The EBGWO algorithm was evaluated on NSL-KDD and six (6) benchmark datasets from the University California Irvine (UCI) repository against ten (10) benchmark metaheuristic algorithms. Experimental results of the EBGWO algorithm on the NSL-KDD dataset in terms of number of selected features and classification accuracy are superior to other benchmark optimisation algorithms. Moreover, experiments on the six (6) UCI datasets showed that the EBGWO algorithm is superior to the benchmark algorithms in terms of classification accuracy and second best for the number of selected features. The proposed EBGWO algorithm can be used for FS in anomaly detection tasks that involve any dataset size from various application domains

    AI Solutions for MDS: Artificial Intelligence Techniques for Misuse Detection and Localisation in Telecommunication Environments

    Get PDF
    This report considers the application of Articial Intelligence (AI) techniques to the problem of misuse detection and misuse localisation within telecommunications environments. A broad survey of techniques is provided, that covers inter alia rule based systems, model-based systems, case based reasoning, pattern matching, clustering and feature extraction, articial neural networks, genetic algorithms, arti cial immune systems, agent based systems, data mining and a variety of hybrid approaches. The report then considers the central issue of event correlation, that is at the heart of many misuse detection and localisation systems. The notion of being able to infer misuse by the correlation of individual temporally distributed events within a multiple data stream environment is explored, and a range of techniques, covering model based approaches, `programmed' AI and machine learning paradigms. It is found that, in general, correlation is best achieved via rule based approaches, but that these suffer from a number of drawbacks, such as the difculty of developing and maintaining an appropriate knowledge base, and the lack of ability to generalise from known misuses to new unseen misuses. Two distinct approaches are evident. One attempts to encode knowledge of known misuses, typically within rules, and use this to screen events. This approach cannot generally detect misuses for which it has not been programmed, i.e. it is prone to issuing false negatives. The other attempts to `learn' the features of event patterns that constitute normal behaviour, and, by observing patterns that do not match expected behaviour, detect when a misuse has occurred. This approach is prone to issuing false positives, i.e. inferring misuse from innocent patterns of behaviour that the system was not trained to recognise. Contemporary approaches are seen to favour hybridisation, often combining detection or localisation mechanisms for both abnormal and normal behaviour, the former to capture known cases of misuse, the latter to capture unknown cases. In some systems, these mechanisms even work together to update each other to increase detection rates and lower false positive rates. It is concluded that hybridisation offers the most promising future direction, but that a rule or state based component is likely to remain, being the most natural approach to the correlation of complex events. The challenge, then, is to mitigate the weaknesses of canonical programmed systems such that learning, generalisation and adaptation are more readily facilitated
    • …
    corecore