6 research outputs found

    Efficient Training Set Use For Blood Pressure Prediction in a Large Scale Learning Classifier System

    Get PDF
    ABSTRACT We define a machine learning problem to forecast arterial blood pressure. Our goal is to solve this problem with a large scale learning classifier system. Because learning classifiers systems are extremely computationally intensive and this problem's eventually large training set will be very costly to execute, we address how to use less of the training set while not negatively impacting learning accuracy. Our approach is to allow competition among solutions which have not been evaluated on the entire training set. The best of these solutions are then evaluated on more of the training set while their offspring start off being evaluated on less of the training set. To keep selection fair, we divide competing solutions according to how many training examples they have been tested on

    Enhancing the scalability of a genetic algorithm to discover quantitative association rules in large-scale datasets

    Get PDF
    Association rule mining is a well-known methodology to discover significant and apparently hidden relations among attributes in a subspace of instances from datasets. Genetic algorithms have been extensively used to find interesting association rules. However, the rule-matching task of such techniques usually requires high computational and memory requirements. The use of efficient computational techniques has become a task of the utmost importance due to the high volume of generated data nowadays. Hence, this paper aims at improving the scalability of quantitative association rule mining techniques based on genetic algorithms to handle large-scale datasets without quality loss in the results obtained. For this purpose, a new representation of the individuals, new genetic operators and a windowing-based learning scheme are proposed to achieve successfully such challenging task. Specifically, the proposed techniques are integrated into the multi-objective evolutionary algorithm named QARGA-M to assess their performances. Both the standard version and the enhanced one of QARGA-M have been tested in several datasets that present different number of attributes and instances. Furthermore, the proposed methodologies have been integrated into other existing techniques based in genetic algorithms to discover quantitative association rules. The comparative analysis performed shows significant improvements of QARGA-M and other existing genetic algorithms in terms of computational costs without losing quality in the results when the proposed techniques are applied.Ministerio de Ciencia y Tecnología TIN2011- 28956-C02-02Junta de Andalucía TIC-7528Junta de Andalucía P12-TIC-1728Universidad Pablo de Olavide APPB81309

    GAMoN: Discovering M-of-N{¬,∨} hypotheses for text classification by a lattice-based Genetic Algorithm

    Get PDF
    AbstractWhile there has been a long history of rule-based text classifiers, to the best of our knowledge no M-of-N-based approach for text categorization has so far been proposed. In this paper we argue that M-of-N hypotheses are particularly suitable to model the text classification task because of the so-called “family resemblance” metaphor: “the members (i.e., documents) of a family (i.e., category) share some small number of features, yet there is no common feature among all of them. Nevertheless, they resemble each other”. Starting from this conjecture, we provide a sound extension of the M-of-N approach with negation and disjunction, called M-of-N{¬,∨}, which enables to best fit the true structure of the data. Based on a thorough theoretical study, we show that the M-of-N{¬,∨} hypothesis space has two partial orders that form complete lattices.GAMoN is the task-specific Genetic Algorithm (GA) which, by exploiting the lattice-based structure of the hypothesis space, efficiently induces accurate M-of-N{¬,∨} hypotheses.Benchmarking was performed over 13 real-world text data sets, by using four rule induction algorithms: two GAs, namely, BioHEL and OlexGA, and two non-evolutionary algorithms, namely, C4.5 and Ripper. Further, we included in our study linear SVM, as it is reported to be among the best methods for text categorization. Experimental results demonstrate that GAMoN delivers state-of-the-art classification performance, providing a good balance between accuracy and model complexity. Further, they show that GAMoN can scale up to large and realistic real-world domains better than both C4.5 and Ripper

    Improved techniques for phishing email detection based on random forest and firefly-based support vector machine learning algorithms.

    Get PDF
    Master of Science in Computer Science. University of KwaZulu-Natal, Durban, 2014.Electronic fraud is one of the major challenges faced by the vast majority of online internet users today. Curbing this menace is not an easy task, primarily because of the rapid rate at which fraudsters change their mode of attack. Many techniques have been proposed in the academic literature to handle e-fraud. Some of them include: blacklist, whitelist, and machine learning (ML) based techniques. Among all these techniques, ML-based techniques have proven to be the most efficient, because of their ability to detect new fraudulent attacks as they appear.There are three commonly perpetrated electronic frauds, namely: email spam, phishing and network intrusion. Among these three, more financial loss has been incurred owing to phishing attacks. This research investigates and reports the use of MLand Nature Inspired technique in the domain of phishing detection, with the foremost objective of developing a dynamic and robust phishing email classifier with improved classification accuracy and reduced processing time.Two approaches to phishing email detection are proposed, and two email classifiers are developed based on the proposed approaches. In the first approach, a random forest algorithm is used to construct decision trees,which are,in turn,used for email classification. The second approach introduced a novel MLmethod that hybridizes firefly algorithm (FFA) and support vector machine (SVM). The hybridized method consists of three major stages: feature extraction phase, hyper-parameter selection phase and email classification phase. In the feature extraction phase, the feature vectors of all the features described in Section 3.6 are extracted and saved in a file for easy access.In the second stage, a novel hyper-parameter search algorithm, developed in this research, is used to generate exponentially growing sequence of paired C and Gamma (γ) values. FFA is then used to optimize the generated SVM hyper-parameters and to also find the best hyper-parameter pair. Finally, in the third phase, SVM is used to carry out the classification. This new approach addresses the problem of hyper-parameter optimization in SVM, and in turn, improves the classification speed and accuracy of SVM. Using two publicly available email datasets, some experiments are performed to evaluate the performance of the two proposed phishing email detection techniques. During the evaluation of each approach, a set of features (well suited for phishing detection) are extracted from the training dataset and used to constructthe classifiers. Thereafter, the trained classifiers are evaluated on the test dataset. The evaluations produced very good results. The RF-based classifier yielded a classification accuracy of 99.70%, a FP rate of 0.06% and a FN rate of 2.50%. Also, the hybridized classifier (known as FFA_SVM) produced a classification accuracy of 99.99%, a FP rate of 0.01% and a FN rate of 0.00%

    Investigating evolutionary checkers by incorporating individual and social learning, N-tuple systems and a round robin tournament

    Get PDF
    In recent years, much research attention has been paid to evolving self-learning game players. Fogel's Blondie24 is just one demonstration of a real success in this field and it has inspired many other scientists. In this thesis, artificial neural networks are employed to evolve game playing strategies for the game of checkers by introducing a league structure into the learning phase of a system based on Blondie24. We believe that this helps eliminate some of the randomness in the evolution. The best player obtained is tested against an evolutionary checkers program based on Blondie24. The results obtained are promising. In addition, we introduce an individual and social learning mechanism into the learning phase of the evolutionary checkers system. The best player obtained is tested against an implementation of an evolutionary checkers program, and also against a player, which utilises a round robin tournament. The results are promising. N-tuple systems are also investigated and are used as position value functions for the game of checkers. The architecture of the n-tuple is utilises temporal difference learning. The best player obtained is compared with an implementation of evolutionary checkers program based on Blondie24, and also against a Blondie24 inspired player, which utilises a round robin tournament. The results are promising. We also address the question of whether piece difference and the look-ahead depth are important factors in the Blondie24 architecture. Our experiments show that piece difference and the look-ahead depth have a significant effect on learning abilities

    Investigating evolutionary checkers by incorporating individual and social learning, N-tuple systems and a round robin tournament

    Get PDF
    In recent years, much research attention has been paid to evolving self-learning game players. Fogel's Blondie24 is just one demonstration of a real success in this field and it has inspired many other scientists. In this thesis, artificial neural networks are employed to evolve game playing strategies for the game of checkers by introducing a league structure into the learning phase of a system based on Blondie24. We believe that this helps eliminate some of the randomness in the evolution. The best player obtained is tested against an evolutionary checkers program based on Blondie24. The results obtained are promising. In addition, we introduce an individual and social learning mechanism into the learning phase of the evolutionary checkers system. The best player obtained is tested against an implementation of an evolutionary checkers program, and also against a player, which utilises a round robin tournament. The results are promising. N-tuple systems are also investigated and are used as position value functions for the game of checkers. The architecture of the n-tuple is utilises temporal difference learning. The best player obtained is compared with an implementation of evolutionary checkers program based on Blondie24, and also against a Blondie24 inspired player, which utilises a round robin tournament. The results are promising. We also address the question of whether piece difference and the look-ahead depth are important factors in the Blondie24 architecture. Our experiments show that piece difference and the look-ahead depth have a significant effect on learning abilities
    corecore