641 research outputs found
Developing Efficient and Effective Intrusion Detection System using Evolutionary Computation
The internet and computer networks have become an essential tool in distributed computing organisations especially because they enable the collaboration between components of heterogeneous systems. The efficiency and flexibility of online services have attracted many applications, but as they have grown in popularity so have the numbers of attacks on them. Thus, security teams must deal with numerous threats where the threat landscape is continuously evolving. The traditional security solutions are by no means enough to create a secure environment, intrusion detection systems (IDSs), which observe system works and detect intrusions, are usually utilised to complement other defence techniques. However, threats are becoming more sophisticated, with attackers using new attack methods or modifying existing ones. Furthermore, building an effective and efficient IDS is a challenging research problem due to the environment resource restrictions and its constant evolution. To mitigate these problems, we propose to use machine learning techniques to assist with the IDS building effort.
In this thesis, Evolutionary Computation (EC) algorithms are empirically investigated for synthesising intrusion detection programs. EC can construct programs for raising intrusion alerts automatically. One novel proposed approach, i.e. Cartesian Genetic Programming, has proved particularly effective. We also used an ensemble-learning paradigm, in which EC algorithms were used as a meta-learning method to produce detectors. The latter is more fully worked out than the former and has proved a significant success. An efficient IDS should always take into account the resource restrictions of the deployed systems. Memory usage and processing speed are critical requirements. We apply a multi-objective approach to find trade-offs among intrusion detection capability and resource consumption of programs and optimise these objectives simultaneously. High complexity and the large size of detectors are identified as general issues with the current approaches. The multi-objective approach is used to evolve Pareto fronts for detectors that aim to maintain the simplicity of the generated patterns. We also investigate the potential application of these algorithms to detect unknown attacks
A Field Guide to Genetic Programming
xiv, 233 p. : il. ; 23 cm.Libro ElectrónicoA Field Guide to Genetic Programming (ISBN 978-1-4092-0073-4) is an introduction to genetic programming (GP). GP is a systematic, domain-independent method for getting computers to solve problems automatically starting from a high-level statement of what needs to be done. Using ideas from natural evolution, GP starts from an ooze of random computer programs, and progressively refines them through processes of mutation and sexual recombination, until solutions emerge. All this without the user having to know or specify the form or structure of solutions in advance. GP has generated a plethora of human-competitive results and applications, including novel scientific discoveries and patentable inventions. The authorsIntroduction --
Representation, initialisation and operators in Tree-based GP --
Getting ready to run genetic programming --
Example genetic programming run --
Alternative initialisations and operators in Tree-based GP --
Modular, grammatical and developmental Tree-based GP --
Linear and graph genetic programming --
Probalistic genetic programming --
Multi-objective genetic programming --
Fast and distributed genetic programming --
GP theory and its applications --
Applications --
Troubleshooting GP --
Conclusions.Contents
xi
1 Introduction
1.1 Genetic Programming in a Nutshell
1.2 Getting Started
1.3 Prerequisites
1.4 Overview of this Field Guide I
Basics
2 Representation, Initialisation and GP
2.1 Representation
2.2 Initialising the Population
2.3 Selection
2.4 Recombination and Mutation Operators in Tree-based
3 Getting Ready to Run Genetic Programming 19
3.1 Step 1: Terminal Set 19
3.2 Step 2: Function Set 20
3.2.1 Closure 21
3.2.2 Sufficiency 23
3.2.3 Evolving Structures other than Programs 23
3.3 Step 3: Fitness Function 24
3.4 Step 4: GP Parameters 26
3.5 Step 5: Termination and solution designation 27
4 Example Genetic Programming Run
4.1 Preparatory Steps 29
4.2 Step-by-Step Sample Run 31
4.2.1 Initialisation 31
4.2.2 Fitness Evaluation Selection, Crossover and Mutation Termination and Solution Designation Advanced Genetic Programming
5 Alternative Initialisations and Operators in
5.1 Constructing the Initial Population
5.1.1 Uniform Initialisation
5.1.2 Initialisation may Affect Bloat
5.1.3 Seeding
5.2 GP Mutation
5.2.1 Is Mutation Necessary?
5.2.2 Mutation Cookbook
5.3 GP Crossover
5.4 Other Techniques 32
5.5 Tree-based GP 39
6 Modular, Grammatical and Developmental Tree-based GP 47
6.1 Evolving Modular and Hierarchical Structures 47
6.1.1 Automatically Defined Functions 48
6.1.2 Program Architecture and Architecture-Altering 50
6.2 Constraining Structures 51
6.2.1 Enforcing Particular Structures 52
6.2.2 Strongly Typed GP 52
6.2.3 Grammar-based Constraints 53
6.2.4 Constraints and Bias 55
6.3 Developmental Genetic Programming 57
6.4 Strongly Typed Autoconstructive GP with PushGP 59
7 Linear and Graph Genetic Programming 61
7.1 Linear Genetic Programming 61
7.1.1 Motivations 61
7.1.2 Linear GP Representations 62
7.1.3 Linear GP Operators 64
7.2 Graph-Based Genetic Programming 65
7.2.1 Parallel Distributed GP (PDGP) 65
7.2.2 PADO 67
7.2.3 Cartesian GP 67
7.2.4 Evolving Parallel Programs using Indirect Encodings 68
8 Probabilistic Genetic Programming
8.1 Estimation of Distribution Algorithms 69
8.2 Pure EDA GP 71
8.3 Mixing Grammars and Probabilities 74
9 Multi-objective Genetic Programming 75
9.1 Combining Multiple Objectives into a Scalar Fitness Function 75
9.2 Keeping the Objectives Separate 76
9.2.1 Multi-objective Bloat and Complexity Control 77
9.2.2 Other Objectives 78
9.2.3 Non-Pareto Criteria 80
9.3 Multiple Objectives via Dynamic and Staged Fitness Functions 80
9.4 Multi-objective Optimisation via Operator Bias 81
10 Fast and Distributed Genetic Programming 83
10.1 Reducing Fitness Evaluations/Increasing their Effectiveness 83
10.2 Reducing Cost of Fitness with Caches 86
10.3 Parallel and Distributed GP are Not Equivalent 88
10.4 Running GP on Parallel Hardware 89
10.4.1 Master–slave GP 89
10.4.2 GP Running on GPUs 90
10.4.3 GP on FPGAs 92
10.4.4 Sub-machine-code GP 93
10.5 Geographically Distributed GP 93
11 GP Theory and its Applications 97
11.1 Mathematical Models 98
11.2 Search Spaces 99
11.3 Bloat 101
11.3.1 Bloat in Theory 101
11.3.2 Bloat Control in Practice 104
III
Practical Genetic Programming
12 Applications
12.1 Where GP has Done Well
12.2 Curve Fitting, Data Modelling and Symbolic Regression
12.3 Human Competitive Results – the Humies
12.4 Image and Signal Processing
12.5 Financial Trading, Time Series, and Economic Modelling
12.6 Industrial Process Control
12.7 Medicine, Biology and Bioinformatics
12.8 GP to Create Searchers and Solvers – Hyper-heuristics xiii
12.9 Entertainment and Computer Games 127
12.10The Arts 127
12.11Compression 128
13 Troubleshooting GP
13.1 Is there a Bug in the Code?
13.2 Can you Trust your Results?
13.3 There are No Silver Bullets
13.4 Small Changes can have Big Effects
13.5 Big Changes can have No Effect
13.6 Study your Populations
13.7 Encourage Diversity
13.8 Embrace Approximation
13.9 Control Bloat
13.10 Checkpoint Results
13.11 Report Well
13.12 Convince your Customers
14 Conclusions
Tricks of the Trade
A Resources
A.1 Key Books
A.2 Key Journals
A.3 Key International Meetings
A.4 GP Implementations
A.5 On-Line Resources 145
B TinyGP 151
B.1 Overview of TinyGP 151
B.2 Input Data Files for TinyGP 153
B.3 Source Code 154
B.4 Compiling and Running TinyGP 162
Bibliography 167
Inde
Evolutionary Granular Kernel Machines
Kernel machines such as Support Vector Machines (SVMs) have been widely used in various data mining applications with good generalization properties. Performance of SVMs for solving nonlinear problems is highly affected by kernel functions. The complexity of SVMs training is mainly related to the size of a training dataset. How to design a powerful kernel, how to speed up SVMs training and how to train SVMs with millions of examples are still challenging problems in the SVMs research. For these important problems, powerful and flexible kernel trees called Evolutionary Granular Kernel Trees (EGKTs) are designed to incorporate prior domain knowledge. Granular Kernel Tree Structure Evolving System (GKTSES) is developed to evolve the structures of Granular Kernel Trees (GKTs) without prior knowledge. A voting scheme is also proposed to reduce the prediction deviation of GKTSES. To speed up EGKTs optimization, a master-slave parallel model is implemented. To help SVMs challenge large-scale data mining, a Minimum Enclosing Ball (MEB) based data reduction method is presented, and a new MEB-SVM algorithm is designed. All these kernel methods are designed based on Granular Computing (GrC). In general, Evolutionary Granular Kernel Machines (EGKMs) are investigated to optimize kernels effectively, speed up training greatly and mine huge amounts of data efficiently
Automated design of genetic programming of classification algorithms.
Doctoral Degree. University of KwaZulu-Natal, Pietermaritzburg.Over the past decades, there has been an increase in the use of evolutionary algorithms (EAs) for data mining and knowledge discovery in a wide range of application domains. Data classification, a real-world application problem is one of the areas EAs have been widely applied. Data classification has been extensively researched resulting in the development of a number of EA based classification algorithms. Genetic programming (GP) in particular has been shown to be one of the most effective EAs at inducing classifiers. It is widely accepted that the effectiveness of a parameterised algorithm like GP depends on its configuration. Currently, the design of GP classification algorithms is predominantly performed manually. Manual design follows an iterative trial and error approach which has been shown to be a menial, non-trivial time-consuming task that has a number of vulnerabilities. The research presented in this thesis is part of a large-scale initiative by the machine learning community to automate the design of machine learning techniques. The study investigates the hypothesis that automating the design of GP classification algorithms for data classification can still lead to the induction of effective classifiers. This research proposes using two evolutionary algorithms,namely,ageneticalgorithm(GA)andgrammaticalevolution(GE)toautomatethe design of GP classification algorithms. The proof-by-demonstration research methodology is used in the study to achieve the set out objectives. To that end two systems namely, a genetic algorithm system and a grammatical evolution system were implemented for automating the design of GP classification algorithms. The classification performance of the automated designed GP classifiers, i.e., GA designed GP classifiers and GE designed GP classifiers were compared to manually designed GP classifiers on real-world binary class and multiclass classification problems. The evaluation was performed on multiple domain problems obtained from the UCI machine learning repository and on two specific domains, cybersecurity and financial forecasting. The automated designed classifiers were found to outperform the manually designed GP classifiers on all the problems considered in this study. GP classifiers evolved by GE were found to be suitable for classifying binary classification problems while those evolved by a GA were found to be suitable for multiclass classification problems. Furthermore, the automated design time was found to be less than manual design time. Fitness landscape analysis of the design spaces searched by a GA and GE were carried out on all the class of problems considered in this study. Grammatical evolution found the search to be smoother on binary classification problems while the GA found multiclass problems to be less rugged than binary class problems
Automated Feature Engineering for Deep Neural Networks with Genetic Programming
Feature engineering is a process that augments the feature vector of a machine learning model with calculated values that are designed to enhance the accuracy of a model’s predictions. Research has shown that the accuracy of models such as deep neural networks, support vector machines, and tree/forest-based algorithms sometimes benefit from feature engineering. Expressions that combine one or more of the original features usually create these engineered features. The choice of the exact structure of an engineered feature is dependent on the type of machine learning model in use. Previous research demonstrated that various model families benefit from different types of engineered feature. Random forests, gradient-boosting machines, or other tree-based models might not see the same accuracy gain that an engineered feature allowed neural networks, generalized linear models, or other dot-product based models to achieve on the same data set. This dissertation presents a genetic programming-based algorithm that automatically engineers features that increase the accuracy of deep neural networks for some data sets. For a genetic programming algorithm to be effective, it must prioritize the search space and efficiently evaluate what it finds. This dissertation algorithm faced a potential search space composed of all possible mathematical combinations of the original feature vector. Five experiments were designed to guide the search process to efficiently evolve good engineered features. The result of this dissertation is an automated feature engineering (AFE) algorithm that is computationally efficient, even though a neural network is used to evaluate each candidate feature. This approach gave the algorithm a greater opportunity to specifically target deep neural networks in its search for engineered features that improve accuracy. Finally, a sixth experiment empirically demonstrated the degree to which this algorithm improved the accuracy of neural networks on data sets augmented by the algorithm’s engineered features
Intelligent data mining using artificial neural networks and genetic algorithms : techniques and applications
Data Mining (DM) refers to the analysis of observational datasets to find
relationships and to summarize the data in ways that are both understandable
and useful. Many DM techniques exist. Compared with other DM techniques,
Intelligent Systems (ISs) based approaches, which include Artificial Neural
Networks (ANNs), fuzzy set theory, approximate reasoning, and derivative-free
optimization methods such as Genetic Algorithms (GAs), are tolerant of
imprecision, uncertainty, partial truth, and approximation. They provide
flexible information processing capability for handling real-life situations. This
thesis is concerned with the ideas behind design, implementation, testing and
application of a novel ISs based DM technique. The unique contribution of this
thesis is in the implementation of a hybrid IS DM technique (Genetic Neural
Mathematical Method, GNMM) for solving novel practical problems, the
detailed description of this technique, and the illustrations of several
applications solved by this novel technique.
GNMM consists of three steps: (1) GA-based input variable selection, (2) Multi-
Layer Perceptron (MLP) modelling, and (3) mathematical programming based
rule extraction. In the first step, GAs are used to evolve an optimal set of MLP
inputs. An adaptive method based on the average fitness of successive
generations is used to adjust the mutation rate, and hence the
exploration/exploitation balance. In addition, GNMM uses the elite group and
appearance percentage to minimize the randomness associated with GAs. In
the second step, MLP modelling serves as the core DM engine in performing
classification/prediction tasks. An Independent Component Analysis (ICA)
based weight initialization algorithm is used to determine optimal weights
before the commencement of training algorithms. The Levenberg-Marquardt
(LM) algorithm is used to achieve a second-order speedup compared to
conventional Back-Propagation (BP) training. In the third step, mathematical
programming based rule extraction is not only used to identify the premises of
multivariate polynomial rules, but also to explore features from the extracted
rules based on data samples associated with each rule. Therefore, the
methodology can provide regression rules and features not only in the
polyhedrons with data instances, but also in the polyhedrons without data
instances.
A total of six datasets from environmental and medical disciplines were used
as case study applications. These datasets involve the prediction of
longitudinal dispersion coefficient, classification of electrocorticography
(ECoG)/Electroencephalogram (EEG) data, eye bacteria Multisensor Data
Fusion (MDF), and diabetes classification (denoted by Data I through to Data VI). GNMM was applied to all these six datasets to explore its effectiveness,
but the emphasis is different for different datasets. For example, the emphasis
of Data I and II was to give a detailed illustration of how GNMM works; Data III
and IV aimed to show how to deal with difficult classification problems; the
aim of Data V was to illustrate the averaging effect of GNMM; and finally Data
VI was concerned with the GA parameter selection and benchmarking GNMM
with other IS DM techniques such as Adaptive Neuro-Fuzzy Inference System
(ANFIS), Evolving Fuzzy Neural Network (EFuNN), Fuzzy ARTMAP, and
Cartesian Genetic Programming (CGP). In addition, datasets obtained from
published works (i.e. Data II & III) or public domains (i.e. Data VI) where
previous results were present in the literature were also used to benchmark
GNMM’s effectiveness.
As a closely integrated system GNMM has the merit that it needs little human
interaction. With some predefined parameters, such as GA’s crossover
probability and the shape of ANNs’ activation functions, GNMM is able to
process raw data until some human-interpretable rules being extracted. This is
an important feature in terms of practice as quite often users of a DM system
have little or no need to fully understand the internal components of such a
system. Through case study applications, it has been shown that the GA-based
variable selection stage is capable of: filtering out irrelevant and noisy
variables, improving the accuracy of the model; making the ANN structure less
complex and easier to understand; and reducing the computational complexity
and memory requirements. Furthermore, rule extraction ensures that the MLP
training results are easily understandable and transferrable
Evolution of trading strategies with flexible structures: A configuration comparison
Evolutionary Computation is often used in the domain of automated discovery of trading rules. Within this area, both Genetic Programming and Grammatical Evolution offer solutions with similar structures that have two key advantages in common: they are both interpretable and flexible in terms of their structure. The core algorithms can be extended to use automatically defined functions or mechanisms aimed to promote parsimony. The number of references on this topic is ample, but most of the studies focus on a specific setup. This means that it is not clear which is the best alternative. This work intends to fill that gap in the literature presenting a comprehensive set of experiments using both techniques with similar variations, and measuring their sensitivity to an increase in population size and composition of the terminal set. The experimental work, based on three S&P 500 data sets, suggest that Grammatical Evolution generates strategies that are more profitable, more robust and simpler, especially when a parsimony control technique was applied. As for the use of automatically defined function, it improved the performance in some experiments, but the results were inconclusive. (C) 2018 Elsevier B.V. All rights reserved.The authors acknowledge financial support granted by the Spanish Ministry of Science and Innovation under grant ENE2014-56126-C2-2-R
- …