931 research outputs found

    Discovering Regression Rules with Ant Colony Optimization

    Get PDF
    The majority of Ant Colony Optimization (ACO) algorithms for data mining have dealt with classification or clustering problems. Regression remains an unexplored research area to the best of our knowledge. This paper proposes a new ACO algorithm that generates regression rules for data mining applications. The new algorithm combines components from an existing deterministic (greedy) separate and conquer algorithm—employing the same quality metrics and continuous attribute processing techniques—allowing a comparison of the two. The new algorithm has been shown to decrease the relative root mean square error when compared to the greedy algorithm. Additionally a different approach to handling continuous attributes was investigated showing further improvements were possible

    Extensions to the ant-miner classification rule discovery algorithm

    Get PDF
    Ant-Miner is an application of ACO in data mining. It has been introduced by Parpinelli et al. in 2002 as an ant-based algorithm for the discovery of classification rules. Ant-Miner has proved to be a very promising technique for classification rules discovery. Ant-Miner generates a fewer number of rules, fewer terms per each rule and performs competitively in terms of efficiency compared to the C4.5 algorithm (see experimental results in [20]). Hence, it has been a focus area of research and a lot of modification has been done to it in order to increase its quality in terms of classification accuracy and output rules comprehensibility (reducing the size of the rule set). The thesis proposes five extensions to Ant-Miner. 1) The thesis proposes the use of a logical negation operator in the antecedents of constructed rules, so the terms in the rule antecedents could be in the form of . This tends to generate rules with higher coverage and reduce the size of the generated rule set. 2) The thesis proposes the use stubborn ants, an ACO-variation in which an ant is allowed to take into consideration its own personal past history. Stubborn ants tend to generate rules with higher classification accuracy in fewer trials per iteration. 3) The thesis proposes the use multiple types of pheromone; one for each permitted rule class, i.e. an ant would first select the rule class and then deposit the corresponding type of pheromone. The multi-pheromone system improves the quality of the output in terms of classification accuracy as well as it comprehensibility. 4) Along with the multi-pheromone system, the thesis proposes a new pheromone update strategy, called quality contrast intensifier. Such a strategy rewards rules with high confidence by depositing more pheromone and penalizes rules with low confidence by removing pheromone. 5) The thesis proposes that each ant to have its own value of α and β parameters, which in a sense means that each ant has its own individual personality. In order to verify the efficiency of these modifications, several cross-validation experiments have been applied on each of eight datasets used in the experiment. Average output results have been recorded, and a test of statistical significance has been applied to indicate improvement significance. Empirical results show improvements in the algorithm\u27s performance in terms of the simplicity of the generated rule set, the number of trials, and the predictive accuracy

    A new sequential covering strategy for inducing classification rules with ant colony algorithms

    Get PDF
    Ant colony optimization (ACO) algorithms have been successfully applied to discover a list of classification rules. In general, these algorithms follow a sequential covering strategy, where a single rule is discovered at each iteration of the algorithm in order to build a list of rules. The sequential covering strategy has the drawback of not coping with the problem of rule interaction, i.e., the outcome of a rule affects the rules that can be discovered subsequently since the search space is modified due to the removal of examples covered by previous rules. This paper proposes a new sequential covering strategy for ACO classification algorithms to mitigate the problem of rule interaction, where the order of the rules is implicitly encoded as pheromone values and the search is guided by the quality of a candidate list of rules. Our experiments using 18 publicly available data sets show that the predictive accuracy obtained by a new ACO classification algorithm implementing the proposed sequential covering strategy is statistically significantly higher than the predictive accuracy of state-of-the-art rule induction classification algorithms

    Discovering Regression and Classification Rules with Monotonic Constraints Using Ant Colony Optimization

    Get PDF
    Data mining is a broad area that encompasses many different tasks from the supervised classification and regression tasks to unsupervised association rule mining and clustering. A first research thread in this thesis is the introduction of new Ant Colony Optimization (ACO)-based algorithms that tackle the regression task in data mining, exploring three different learning strategies: Iterative Rule Learning, Pittsburgh and Michigan strategies. The Iterative Rule Learning strategy constructs one rule at a time, where the best rule created by the ant colony is added to the rule list at each iteration, until a complete rule list is created. In the Michigan strategy, each ant constructs a single rule and from this collection of rules a niching algorithm combines the rules to create the final rule list. Finally, in the Pittsburgh strategy each ant constructs an entire rule list at each iteration, with the best list constructed by an ant in any iteration representing the final model. The most successful Pittsburgh-based Ant-Miner-Reg_PB algorithm, among the three variants, has been shown to be competitive against a well-known regression rule induction algorithm from the literature. The second research thread pursued involved incorporating existing domain knowledge to guide the construction of models as it is rare to find new domains that nothing is known about. One type of domain knowledge that occurs frequently in real world data-sets is monotonic constraints which capture increasing or decreasing trends within the data. In this thesis, monotonic constraints have been introduced into ACO-based rule induction algorithms for both classification and regression tasks. The enforcement of monotonic constraints has been implemented as a two step process. The first is a soft constraint preference in the model construction phase. This is followed by a hard constraint post-processing pruning suite to ensure the production of monotonic models. The new algorithms presented here have been shown to maintain and improve their predictive power when compared to non-monotonic rule induction algorithms

    Ant colony optimization approach for stacking configurations

    Full text link
    In data mining, classifiers are generated to predict the class labels of the instances. An ensemble is a decision making system which applies certain strategies to combine the predictions of different classifiers and generate a collective decision. Previous research has empirically and theoretically demonstrated that an ensemble classifier can be more accurate and stable than its component classifiers in most cases. Stacking is a well-known ensemble which adopts a two-level structure: the base-level classifiers to generate predictions and the meta-level classifier to make collective decisions. A consequential problem is: what learning algorithms should be used to generate the base-level and meta-level classifier in the Stacking configuration? It is not easy to find a suitable configuration for a specific dataset. In some early works, the selection of a meta classifier and its training data are the major concern. Recently, researchers have tried to apply metaheuristic methods to optimize the configuration of the base classifiers and the meta classifier. Ant Colony Optimization (ACO), which is inspired by the foraging behaviors of real ant colonies, is one of the most popular approaches among the metaheuristics. In this work, we propose a novel ACO-Stacking approach that uses ACO to tackle the Stacking configuration problem. This work is the first to apply ACO to the Stacking configuration problem. Different implementations of the ACO-Stacking approach are developed. The first version identifies the appropriate learning algorithms in generating the base-level classifiers while using a specific algorithm to create the meta-level classifier. The second version simultaneously finds the suitable learning algorithms to create the base-level classifiers and the meta-level classifier. Moreover, we study how different kinds on local information of classifiers will affect the classification results. Several pieces of local information collected from the initial phase of ACO-Stacking are considered, such as the precision, f-measure of each classifier and correlative differences of paired classifiers. A series of experiments are performed to compare the ACO-Stacking approach with other ensembles on a number of datasets of different domains and sizes. The experiments show that the new approach can achieve promising results and gain advantages over other ensembles. The correlative differences of the classifiers could be the best local information in this approach. Under the agile ACO-Stacking framework, an application to deal with a direct marketing problem is explored. A real world database from a US-based catalog company, containing more than 100,000 customer marketing records, is used in the experiments. The results indicate that our approach can gain more cumulative response lifts and cumulative profit lifts in the top deciles. In conclusion, it is competitive with some well-known conventional and ensemble data mining methods

    Adaptive Parameter Control Strategy for Ant-Miner Classification Algorithm

    Get PDF
    Pruning is the popular framework for preventing the dilemma of overfitting noisy data. This paper presents a new hybrid Ant-Miner classification algorithm and ant colony system (ACS), called ACS-AntMiner. A key aspect of this algorithm is the selection of an appropriate number of terms to be included in the classification rule. ACS-AntMiner introduces a new parameter called importance rate (IR) which is a pre-pruning criterion based on the probability (heuristic and pheromone) amount. This criterion is responsible for adding only the important terms to each rule, thus discarding noisy data. The ACS algorithm is designed to optimize the IR parameter during the learning process of the Ant-Miner algorithm. The performance of the proposed classifier is compared with related ant-mining classifiers, namely, Ant-Miner, CAnt-Miner, TACO-Miner, and Ant-Miner with a hybrid pruner across several datasets. Experimental results show that the proposed classifier significantly outperforms the other ant-mining classifiers

    Ant colony optimization algorithm for rule based classification: Issues and potential

    Get PDF
    Classification rule discovery using ant colony optimization (ACO) imitates the foraging behavior of real ant colonies. It is considered as one of the successful swarm intelligence metaheuristics for data classification. ACO has gained importance because of its stochastic feature and iterative adaptation procedure based on positive feedback, both of which allow for the exploration of a large area of the search space. Nevertheless, ACO also has several drawbacks that may reduce the classification accuracy and the computational time of the algorithm. This paper presents a review of related work of ACO rule classification which emphasizes the types of ACO algorithms and issues. Potential solutions that may be considered to improve the performance of ACO algorithms in the classification domain were also presented. Furthermore, this review can be used as a source of reference to other researchers in developing new ACO algorithms for rule classification

    An adaptive ant colony optimization algorithm for rule-based classification

    Get PDF
    Classification is an important data mining task with different applications in many fields. Various classification algorithms have been developed to produce classification models with high accuracy. Differing from other complex and difficult classification models, rules-based classification algorithms produce models which are understandable for users. Ant-Miner is a variant of ant colony optimisation and a prominent intelligent algorithm widely use in rules-based classification. However, the Ant-Miner has overfitting and easily falls into local optima problems which resulted in low classification accuracy and complex classification rules. In this study, a new Ant-Miner classifier is developed, named Adaptive Genetic Iterated-AntMiner (AGI-AntMiner) that aims to avoid local optima and overfitting problems. The components of AGI-AntMiner includes: i) an Adaptive AntMiner which is a prepruning technique to dynamically select the appropriate threshold based on the quality of the rules; ii) Genetic AntMiner that improves the post-pruning by adding/removing terms in a dual manner; and, iii) an Iterated Local Search-AntMiner that improves exploitation based on multiple-neighbourhood structure. The proposed AGI-AntMiner algorithm is evaluated on 16 benchmark datasets of medical, financial, gaming and social domains obtained from the University California Irvine repository. The algorithm’s performance was compared with other variants of Ant-Miner and state-of-the-art rules-based classification algorithms based on classification accuracy and model complexity. Experimental results proved that the proposed AGI-AntMiner algorithm is superior in two (2) aspects. Hybridization of local search in AGI-AntMiner has improved the exploitation mechanism which leads to the discovery of more accurate classification rules. The new pre-pruning and postpruning techniques have improved the pruning ability to produce shorter classification rules which are easier to interpret by the users. Thus, the proposed AGI-AntMiner algorithm is capable in conducting an efficient search in finding the best classification rules that balance the classification accuracy and model complexity to overcome overfitting and local optima problems