6 research outputs found
Intelligent data mining using artificial neural networks and genetic algorithms : techniques and applications
Data Mining (DM) refers to the analysis of observational datasets to find
relationships and to summarize the data in ways that are both understandable
and useful. Many DM techniques exist. Compared with other DM techniques,
Intelligent Systems (ISs) based approaches, which include Artificial Neural
Networks (ANNs), fuzzy set theory, approximate reasoning, and derivative-free
optimization methods such as Genetic Algorithms (GAs), are tolerant of
imprecision, uncertainty, partial truth, and approximation. They provide
flexible information processing capability for handling real-life situations. This
thesis is concerned with the ideas behind design, implementation, testing and
application of a novel ISs based DM technique. The unique contribution of this
thesis is in the implementation of a hybrid IS DM technique (Genetic Neural
Mathematical Method, GNMM) for solving novel practical problems, the
detailed description of this technique, and the illustrations of several
applications solved by this novel technique.
GNMM consists of three steps: (1) GA-based input variable selection, (2) Multi-
Layer Perceptron (MLP) modelling, and (3) mathematical programming based
rule extraction. In the first step, GAs are used to evolve an optimal set of MLP
inputs. An adaptive method based on the average fitness of successive
generations is used to adjust the mutation rate, and hence the
exploration/exploitation balance. In addition, GNMM uses the elite group and
appearance percentage to minimize the randomness associated with GAs. In
the second step, MLP modelling serves as the core DM engine in performing
classification/prediction tasks. An Independent Component Analysis (ICA)
based weight initialization algorithm is used to determine optimal weights
before the commencement of training algorithms. The Levenberg-Marquardt
(LM) algorithm is used to achieve a second-order speedup compared to
conventional Back-Propagation (BP) training. In the third step, mathematical
programming based rule extraction is not only used to identify the premises of
multivariate polynomial rules, but also to explore features from the extracted
rules based on data samples associated with each rule. Therefore, the
methodology can provide regression rules and features not only in the
polyhedrons with data instances, but also in the polyhedrons without data
instances.
A total of six datasets from environmental and medical disciplines were used
as case study applications. These datasets involve the prediction of
longitudinal dispersion coefficient, classification of electrocorticography
(ECoG)/Electroencephalogram (EEG) data, eye bacteria Multisensor Data
Fusion (MDF), and diabetes classification (denoted by Data I through to Data VI). GNMM was applied to all these six datasets to explore its effectiveness,
but the emphasis is different for different datasets. For example, the emphasis
of Data I and II was to give a detailed illustration of how GNMM works; Data III
and IV aimed to show how to deal with difficult classification problems; the
aim of Data V was to illustrate the averaging effect of GNMM; and finally Data
VI was concerned with the GA parameter selection and benchmarking GNMM
with other IS DM techniques such as Adaptive Neuro-Fuzzy Inference System
(ANFIS), Evolving Fuzzy Neural Network (EFuNN), Fuzzy ARTMAP, and
Cartesian Genetic Programming (CGP). In addition, datasets obtained from
published works (i.e. Data II & III) or public domains (i.e. Data VI) where
previous results were present in the literature were also used to benchmark
GNMMâs effectiveness.
As a closely integrated system GNMM has the merit that it needs little human
interaction. With some predefined parameters, such as GAâs crossover
probability and the shape of ANNsâ activation functions, GNMM is able to
process raw data until some human-interpretable rules being extracted. This is
an important feature in terms of practice as quite often users of a DM system
have little or no need to fully understand the internal components of such a
system. Through case study applications, it has been shown that the GA-based
variable selection stage is capable of: filtering out irrelevant and noisy
variables, improving the accuracy of the model; making the ANN structure less
complex and easier to understand; and reducing the computational complexity
and memory requirements. Furthermore, rule extraction ensures that the MLP
training results are easily understandable and transferrable
Intelligent data mining using artificial neural networks and genetic algorithms : techniques and applications
Data Mining (DM) refers to the analysis of observational datasets to find relationships and to summarize the data in ways that are both understandable and useful. Many DM techniques exist. Compared with other DM techniques, Intelligent Systems (ISs) based approaches, which include Artificial Neural Networks (ANNs), fuzzy set theory, approximate reasoning, and derivative-free optimization methods such as Genetic Algorithms (GAs), are tolerant of imprecision, uncertainty, partial truth, and approximation. They provide flexible information processing capability for handling real-life situations. This thesis is concerned with the ideas behind design, implementation, testing and application of a novel ISs based DM technique. The unique contribution of this thesis is in the implementation of a hybrid IS DM technique (Genetic Neural Mathematical Method, GNMM) for solving novel practical problems, the detailed description of this technique, and the illustrations of several applications solved by this novel technique. GNMM consists of three steps: (1) GA-based input variable selection, (2) Multi- Layer Perceptron (MLP) modelling, and (3) mathematical programming based rule extraction. In the first step, GAs are used to evolve an optimal set of MLP inputs. An adaptive method based on the average fitness of successive generations is used to adjust the mutation rate, and hence the exploration/exploitation balance. In addition, GNMM uses the elite group and appearance percentage to minimize the randomness associated with GAs. In the second step, MLP modelling serves as the core DM engine in performing classification/prediction tasks. An Independent Component Analysis (ICA) based weight initialization algorithm is used to determine optimal weights before the commencement of training algorithms. The Levenberg-Marquardt (LM) algorithm is used to achieve a second-order speedup compared to conventional Back-Propagation (BP) training. In the third step, mathematical programming based rule extraction is not only used to identify the premises of multivariate polynomial rules, but also to explore features from the extracted rules based on data samples associated with each rule. Therefore, the methodology can provide regression rules and features not only in the polyhedrons with data instances, but also in the polyhedrons without data instances. A total of six datasets from environmental and medical disciplines were used as case study applications. These datasets involve the prediction of longitudinal dispersion coefficient, classification of electrocorticography (ECoG)/Electroencephalogram (EEG) data, eye bacteria Multisensor Data Fusion (MDF), and diabetes classification (denoted by Data I through to Data VI). GNMM was applied to all these six datasets to explore its effectiveness, but the emphasis is different for different datasets. For example, the emphasis of Data I and II was to give a detailed illustration of how GNMM works; Data III and IV aimed to show how to deal with difficult classification problems; the aim of Data V was to illustrate the averaging effect of GNMM; and finally Data VI was concerned with the GA parameter selection and benchmarking GNMM with other IS DM techniques such as Adaptive Neuro-Fuzzy Inference System (ANFIS), Evolving Fuzzy Neural Network (EFuNN), Fuzzy ARTMAP, and Cartesian Genetic Programming (CGP). In addition, datasets obtained from published works (i.e. Data II ;III) or public domains (i.e. Data VI) where previous results were present in the literature were also used to benchmark GNMMâs effectiveness. As a closely integrated system GNMM has the merit that it needs little human interaction. With some predefined parameters, such as GAâs crossover probability and the shape of ANNsâ activation functions, GNMM is able to process raw data until some human-interpretable rules being extracted. This is an important feature in terms of practice as quite often users of a DM system have little or no need to fully understand the internal components of such a system. Through case study applications, it has been shown that the GA-based variable selection stage is capable of: filtering out irrelevant and noisy variables, improving the accuracy of the model; making the ANN structure less complex and easier to understand; and reducing the computational complexity and memory requirements. Furthermore, rule extraction ensures that the MLP training results are easily understandable and transferrable.EThOS - Electronic Theses Online ServiceUniversity of WarwickOverseas Research Students Awards SchemeGBUnited Kingdo
Intelligent data mining using artificial neural networks and genetic algorithms : techniques and applications
Data Mining (DM) refers to the analysis of observational datasets to find relationships and to summarize the data in ways that are both understandable and useful. Many DM techniques exist. Compared with other DM techniques, Intelligent Systems (ISs) based approaches, which include Artificial Neural Networks (ANNs), fuzzy set theory, approximate reasoning, and derivative-free optimization methods such as Genetic Algorithms (GAs), are tolerant of imprecision, uncertainty, partial truth, and approximation. They provide flexible information processing capability for handling real-life situations. This thesis is concerned with the ideas behind design, implementation, testing and application of a novel ISs based DM technique. The unique contribution of this thesis is in the implementation of a hybrid IS DM technique (Genetic Neural Mathematical Method, GNMM) for solving novel practical problems, the detailed description of this technique, and the illustrations of several applications solved by this novel technique. GNMM consists of three steps: (1) GA-based input variable selection, (2) Multi- Layer Perceptron (MLP) modelling, and (3) mathematical programming based rule extraction. In the first step, GAs are used to evolve an optimal set of MLP inputs. An adaptive method based on the average fitness of successive generations is used to adjust the mutation rate, and hence the exploration/exploitation balance. In addition, GNMM uses the elite group and appearance percentage to minimize the randomness associated with GAs. In the second step, MLP modelling serves as the core DM engine in performing classification/prediction tasks. An Independent Component Analysis (ICA) based weight initialization algorithm is used to determine optimal weights before the commencement of training algorithms. The Levenberg-Marquardt (LM) algorithm is used to achieve a second-order speedup compared to conventional Back-Propagation (BP) training. In the third step, mathematical programming based rule extraction is not only used to identify the premises of multivariate polynomial rules, but also to explore features from the extracted rules based on data samples associated with each rule. Therefore, the methodology can provide regression rules and features not only in the polyhedrons with data instances, but also in the polyhedrons without data instances. A total of six datasets from environmental and medical disciplines were used as case study applications. These datasets involve the prediction of longitudinal dispersion coefficient, classification of electrocorticography (ECoG)/Electroencephalogram (EEG) data, eye bacteria Multisensor Data Fusion (MDF), and diabetes classification (denoted by Data I through to Data VI). GNMM was applied to all these six datasets to explore its effectiveness, but the emphasis is different for different datasets. For example, the emphasis of Data I and II was to give a detailed illustration of how GNMM works; Data III and IV aimed to show how to deal with difficult classification problems; the aim of Data V was to illustrate the averaging effect of GNMM; and finally Data VI was concerned with the GA parameter selection and benchmarking GNMM with other IS DM techniques such as Adaptive Neuro-Fuzzy Inference System (ANFIS), Evolving Fuzzy Neural Network (EFuNN), Fuzzy ARTMAP, and Cartesian Genetic Programming (CGP). In addition, datasets obtained from published works (i.e. Data II ;III) or public domains (i.e. Data VI) where previous results were present in the literature were also used to benchmark GNMMâs effectiveness. As a closely integrated system GNMM has the merit that it needs little human interaction. With some predefined parameters, such as GAâs crossover probability and the shape of ANNsâ activation functions, GNMM is able to process raw data until some human-interpretable rules being extracted. This is an important feature in terms of practice as quite often users of a DM system have little or no need to fully understand the internal components of such a system. Through case study applications, it has been shown that the GA-based variable selection stage is capable of: filtering out irrelevant and noisy variables, improving the accuracy of the model; making the ANN structure less complex and easier to understand; and reducing the computational complexity and memory requirements. Furthermore, rule extraction ensures that the MLP training results are easily understandable and transferrable.EThOS - Electronic Theses Online ServiceUniversity of WarwickOverseas Research Students Awards SchemeGBUnited Kingdo
Intelligent feature selection for neural regression : techniques and applications
Feature Selection (FS) and regression are two important technique categories in
Data Mining (DM). In general, DM refers to the analysis of observational datasets
to extract useful information and to summarise the data so that it can be more
understandable and be used more efficiently in terms of storage and processing.
FS is the technique of selecting a subset of features that are relevant to the
development of learning models. Regression is the process of modelling and
identifying the possible relationships between groups of features (variables).
Comparing with the conventional techniques, Intelligent System Techniques
(ISTs) are usually favourable due to their flexible capabilities for handling realâlife
problems and the tolerance to data imprecision, uncertainty, partial truth, etc.
This thesis introduces a novel hybrid intelligent technique, namely Sensitive
Genetic Neural Optimisation (SGNO), which is capable of reducing the
dimensionality of a dataset by identifying the most important group of features.
The capability of SGNO is evaluated with four practical applications in three
research areas, including plant science, civil engineering and economics.
SGNO is constructed using three key techniques, known as the core modules,
including Genetic Algorithm (GA), Neural Network (NN) and Sensitivity Analysis
(SA). The GA module controls the progress of the algorithm and employs the NN
module as its fitness function. The SA module quantifies the importance of each
available variable using the results generated in the GA module. The global
sensitivity scores of the variables are used determine the importance of the
variables. Variables of higher sensitivity scores are considered to be more important than the variables with lower sensitivity scores. After determining the
variablesâ importance, the performance of SGNO is evaluated using the NN module
that takes various numbers of variables with the highest global sensitivity scores
as the inputs. In addition, the symbolic relationship between a group of variables
with the highest global sensitivity scores and the model output is discovered
using the MultipleâBranch Encoded Genetic Programming (MBEâGP).
A total of four datasets have been used to evaluate the performance of SGNO.
These datasets involve the prediction of shortâterm greenhouse tomato yield,
prediction of longitudinal dispersion coefficients in natural rivers, prediction of
wave overtopping at coastal structures and the modelling of relationship between
the growth of industrial inputs and the growth of the gross industrial output.
SGNO was applied to all these datasets to explore its effectiveness of reducing the
dimensionality of the datasets. The performance of SGNO is benchmarked with
four dimensionality reduction techniques, including Backward Feature Selection
(BFS), Forward Feature Selection (FFS), Principal Component Analysis (PCA) and
Genetic Neural Mathematical Method (GNMM).
The applications of SGNO on these datasets showed that SGNO is capable of
identifying the most important feature groups of in the datasets effectively and
the general performance of SGNO is better than those benchmarking techniques.
Furthermore, the symbolic relationships discovered using MBEâGP can generate
performance competitive to the performance of NN models in terms of regression
accuracies
Error management in ATLAS TDAQ : an intelligent systems approach
This thesis is concerned with the use of intelligent system techniques (IST) within
a large distributed software system, specifically the ATLAS TDAQ system which
has been developed and is currently in use at the European Laboratory for Particle
Physics(CERN). The overall aim is to investigate and evaluate a range of ITS
techniques in order to improve the error management system (EMS) currently used
within the TDAQ system via error detection and classification. The thesis work
will provide a reference for future research and development of such methods in the
TDAQ system.
The thesis begins by describing the TDAQ system and the existing EMS, with a
focus on the underlying expert system approach, in order to identify areas where
improvements can be made using IST techniques. It then discusses measures of
evaluating error detection and classification techniques and the factors specific to
the TDAQ system.
Error conditions are then simulated in a controlled manner using an experimental
setup and datasets were gathered from two different sources. Analysis and processing
of the datasets using statistical and ITS techniques shows that clusters exists in
the data corresponding to the different simulated errors.
Different ITS techniques are applied to the gathered datasets in order to realise an
error detection model. These techniques include Artificial Neural Networks (ANNs),
Support Vector Machines (SVMs) and Cartesian Genetic Programming (CGP) and
a comparison of the respective advantages and disadvantages is made.
The principle conclusions from this work are that IST can be successfully used to
detect errors in the ATLAS TDAQ system and thus can provide a tool to improve
the overall error management system. It is of particular importance that the IST can
be used without having a detailed knowledge of the system, as the ATLAS TDAQ
is too complex for a single person to have complete understanding of. The results
of this research will benefit researchers developing and evaluating IST techniques in
similar large scale distributed systems
Error management in ATLAS TDAQ : an intelligent systems approach
This thesis is concerned with the use of intelligent system techniques (IST) within a large distributed software system, specifically the ATLAS TDAQ system which has been developed and is currently in use at the European Laboratory for Particle Physics(CERN). The overall aim is to investigate and evaluate a range of ITS techniques in order to improve the error management system (EMS) currently used within the TDAQ system via error detection and classification. The thesis work will provide a reference for future research and development of such methods in the TDAQ system. The thesis begins by describing the TDAQ system and the existing EMS, with a focus on the underlying expert system approach, in order to identify areas where improvements can be made using IST techniques. It then discusses measures of evaluating error detection and classification techniques and the factors specific to the TDAQ system. Error conditions are then simulated in a controlled manner using an experimental setup and datasets were gathered from two different sources. Analysis and processing of the datasets using statistical and ITS techniques shows that clusters exists in the data corresponding to the different simulated errors. Different ITS techniques are applied to the gathered datasets in order to realise an error detection model. These techniques include Artificial Neural Networks (ANNs), Support Vector Machines (SVMs) and Cartesian Genetic Programming (CGP) and a comparison of the respective advantages and disadvantages is made. The principle conclusions from this work are that IST can be successfully used to detect errors in the ATLAS TDAQ system and thus can provide a tool to improve the overall error management system. It is of particular importance that the IST can be used without having a detailed knowledge of the system, as the ATLAS TDAQ is too complex for a single person to have complete understanding of. The results of this research will benefit researchers developing and evaluating IST techniques in similar large scale distributed systems.EThOS - Electronic Theses Online ServiceGBUnited Kingdo