33 research outputs found
Intelligent data mining using artificial neural networks and genetic algorithms : techniques and applications
Data Mining (DM) refers to the analysis of observational datasets to find
relationships and to summarize the data in ways that are both understandable
and useful. Many DM techniques exist. Compared with other DM techniques,
Intelligent Systems (ISs) based approaches, which include Artificial Neural
Networks (ANNs), fuzzy set theory, approximate reasoning, and derivative-free
optimization methods such as Genetic Algorithms (GAs), are tolerant of
imprecision, uncertainty, partial truth, and approximation. They provide
flexible information processing capability for handling real-life situations. This
thesis is concerned with the ideas behind design, implementation, testing and
application of a novel ISs based DM technique. The unique contribution of this
thesis is in the implementation of a hybrid IS DM technique (Genetic Neural
Mathematical Method, GNMM) for solving novel practical problems, the
detailed description of this technique, and the illustrations of several
applications solved by this novel technique.
GNMM consists of three steps: (1) GA-based input variable selection, (2) Multi-
Layer Perceptron (MLP) modelling, and (3) mathematical programming based
rule extraction. In the first step, GAs are used to evolve an optimal set of MLP
inputs. An adaptive method based on the average fitness of successive
generations is used to adjust the mutation rate, and hence the
exploration/exploitation balance. In addition, GNMM uses the elite group and
appearance percentage to minimize the randomness associated with GAs. In
the second step, MLP modelling serves as the core DM engine in performing
classification/prediction tasks. An Independent Component Analysis (ICA)
based weight initialization algorithm is used to determine optimal weights
before the commencement of training algorithms. The Levenberg-Marquardt
(LM) algorithm is used to achieve a second-order speedup compared to
conventional Back-Propagation (BP) training. In the third step, mathematical
programming based rule extraction is not only used to identify the premises of
multivariate polynomial rules, but also to explore features from the extracted
rules based on data samples associated with each rule. Therefore, the
methodology can provide regression rules and features not only in the
polyhedrons with data instances, but also in the polyhedrons without data
instances.
A total of six datasets from environmental and medical disciplines were used
as case study applications. These datasets involve the prediction of
longitudinal dispersion coefficient, classification of electrocorticography
(ECoG)/Electroencephalogram (EEG) data, eye bacteria Multisensor Data
Fusion (MDF), and diabetes classification (denoted by Data I through to Data VI). GNMM was applied to all these six datasets to explore its effectiveness,
but the emphasis is different for different datasets. For example, the emphasis
of Data I and II was to give a detailed illustration of how GNMM works; Data III
and IV aimed to show how to deal with difficult classification problems; the
aim of Data V was to illustrate the averaging effect of GNMM; and finally Data
VI was concerned with the GA parameter selection and benchmarking GNMM
with other IS DM techniques such as Adaptive Neuro-Fuzzy Inference System
(ANFIS), Evolving Fuzzy Neural Network (EFuNN), Fuzzy ARTMAP, and
Cartesian Genetic Programming (CGP). In addition, datasets obtained from
published works (i.e. Data II & III) or public domains (i.e. Data VI) where
previous results were present in the literature were also used to benchmark
GNMM’s effectiveness.
As a closely integrated system GNMM has the merit that it needs little human
interaction. With some predefined parameters, such as GA’s crossover
probability and the shape of ANNs’ activation functions, GNMM is able to
process raw data until some human-interpretable rules being extracted. This is
an important feature in terms of practice as quite often users of a DM system
have little or no need to fully understand the internal components of such a
system. Through case study applications, it has been shown that the GA-based
variable selection stage is capable of: filtering out irrelevant and noisy
variables, improving the accuracy of the model; making the ANN structure less
complex and easier to understand; and reducing the computational complexity
and memory requirements. Furthermore, rule extraction ensures that the MLP
training results are easily understandable and transferrable
Intelligent data mining using artificial neural networks and genetic algorithms : techniques and applications
Data Mining (DM) refers to the analysis of observational datasets to find relationships and to summarize the data in ways that are both understandable and useful. Many DM techniques exist. Compared with other DM techniques, Intelligent Systems (ISs) based approaches, which include Artificial Neural Networks (ANNs), fuzzy set theory, approximate reasoning, and derivative-free optimization methods such as Genetic Algorithms (GAs), are tolerant of imprecision, uncertainty, partial truth, and approximation. They provide flexible information processing capability for handling real-life situations. This thesis is concerned with the ideas behind design, implementation, testing and application of a novel ISs based DM technique. The unique contribution of this thesis is in the implementation of a hybrid IS DM technique (Genetic Neural Mathematical Method, GNMM) for solving novel practical problems, the detailed description of this technique, and the illustrations of several applications solved by this novel technique. GNMM consists of three steps: (1) GA-based input variable selection, (2) Multi- Layer Perceptron (MLP) modelling, and (3) mathematical programming based rule extraction. In the first step, GAs are used to evolve an optimal set of MLP inputs. An adaptive method based on the average fitness of successive generations is used to adjust the mutation rate, and hence the exploration/exploitation balance. In addition, GNMM uses the elite group and appearance percentage to minimize the randomness associated with GAs. In the second step, MLP modelling serves as the core DM engine in performing classification/prediction tasks. An Independent Component Analysis (ICA) based weight initialization algorithm is used to determine optimal weights before the commencement of training algorithms. The Levenberg-Marquardt (LM) algorithm is used to achieve a second-order speedup compared to conventional Back-Propagation (BP) training. In the third step, mathematical programming based rule extraction is not only used to identify the premises of multivariate polynomial rules, but also to explore features from the extracted rules based on data samples associated with each rule. Therefore, the methodology can provide regression rules and features not only in the polyhedrons with data instances, but also in the polyhedrons without data instances. A total of six datasets from environmental and medical disciplines were used as case study applications. These datasets involve the prediction of longitudinal dispersion coefficient, classification of electrocorticography (ECoG)/Electroencephalogram (EEG) data, eye bacteria Multisensor Data Fusion (MDF), and diabetes classification (denoted by Data I through to Data VI). GNMM was applied to all these six datasets to explore its effectiveness, but the emphasis is different for different datasets. For example, the emphasis of Data I and II was to give a detailed illustration of how GNMM works; Data III and IV aimed to show how to deal with difficult classification problems; the aim of Data V was to illustrate the averaging effect of GNMM; and finally Data VI was concerned with the GA parameter selection and benchmarking GNMM with other IS DM techniques such as Adaptive Neuro-Fuzzy Inference System (ANFIS), Evolving Fuzzy Neural Network (EFuNN), Fuzzy ARTMAP, and Cartesian Genetic Programming (CGP). In addition, datasets obtained from published works (i.e. Data II ;III) or public domains (i.e. Data VI) where previous results were present in the literature were also used to benchmark GNMM’s effectiveness. As a closely integrated system GNMM has the merit that it needs little human interaction. With some predefined parameters, such as GA’s crossover probability and the shape of ANNs’ activation functions, GNMM is able to process raw data until some human-interpretable rules being extracted. This is an important feature in terms of practice as quite often users of a DM system have little or no need to fully understand the internal components of such a system. Through case study applications, it has been shown that the GA-based variable selection stage is capable of: filtering out irrelevant and noisy variables, improving the accuracy of the model; making the ANN structure less complex and easier to understand; and reducing the computational complexity and memory requirements. Furthermore, rule extraction ensures that the MLP training results are easily understandable and transferrable.EThOS - Electronic Theses Online ServiceUniversity of WarwickOverseas Research Students Awards SchemeGBUnited Kingdo
Intelligent data mining using artificial neural networks and genetic algorithms : techniques and applications
Data Mining (DM) refers to the analysis of observational datasets to find relationships and to summarize the data in ways that are both understandable and useful. Many DM techniques exist. Compared with other DM techniques, Intelligent Systems (ISs) based approaches, which include Artificial Neural Networks (ANNs), fuzzy set theory, approximate reasoning, and derivative-free optimization methods such as Genetic Algorithms (GAs), are tolerant of imprecision, uncertainty, partial truth, and approximation. They provide flexible information processing capability for handling real-life situations. This thesis is concerned with the ideas behind design, implementation, testing and application of a novel ISs based DM technique. The unique contribution of this thesis is in the implementation of a hybrid IS DM technique (Genetic Neural Mathematical Method, GNMM) for solving novel practical problems, the detailed description of this technique, and the illustrations of several applications solved by this novel technique. GNMM consists of three steps: (1) GA-based input variable selection, (2) Multi- Layer Perceptron (MLP) modelling, and (3) mathematical programming based rule extraction. In the first step, GAs are used to evolve an optimal set of MLP inputs. An adaptive method based on the average fitness of successive generations is used to adjust the mutation rate, and hence the exploration/exploitation balance. In addition, GNMM uses the elite group and appearance percentage to minimize the randomness associated with GAs. In the second step, MLP modelling serves as the core DM engine in performing classification/prediction tasks. An Independent Component Analysis (ICA) based weight initialization algorithm is used to determine optimal weights before the commencement of training algorithms. The Levenberg-Marquardt (LM) algorithm is used to achieve a second-order speedup compared to conventional Back-Propagation (BP) training. In the third step, mathematical programming based rule extraction is not only used to identify the premises of multivariate polynomial rules, but also to explore features from the extracted rules based on data samples associated with each rule. Therefore, the methodology can provide regression rules and features not only in the polyhedrons with data instances, but also in the polyhedrons without data instances. A total of six datasets from environmental and medical disciplines were used as case study applications. These datasets involve the prediction of longitudinal dispersion coefficient, classification of electrocorticography (ECoG)/Electroencephalogram (EEG) data, eye bacteria Multisensor Data Fusion (MDF), and diabetes classification (denoted by Data I through to Data VI). GNMM was applied to all these six datasets to explore its effectiveness, but the emphasis is different for different datasets. For example, the emphasis of Data I and II was to give a detailed illustration of how GNMM works; Data III and IV aimed to show how to deal with difficult classification problems; the aim of Data V was to illustrate the averaging effect of GNMM; and finally Data VI was concerned with the GA parameter selection and benchmarking GNMM with other IS DM techniques such as Adaptive Neuro-Fuzzy Inference System (ANFIS), Evolving Fuzzy Neural Network (EFuNN), Fuzzy ARTMAP, and Cartesian Genetic Programming (CGP). In addition, datasets obtained from published works (i.e. Data II ;III) or public domains (i.e. Data VI) where previous results were present in the literature were also used to benchmark GNMM’s effectiveness. As a closely integrated system GNMM has the merit that it needs little human interaction. With some predefined parameters, such as GA’s crossover probability and the shape of ANNs’ activation functions, GNMM is able to process raw data until some human-interpretable rules being extracted. This is an important feature in terms of practice as quite often users of a DM system have little or no need to fully understand the internal components of such a system. Through case study applications, it has been shown that the GA-based variable selection stage is capable of: filtering out irrelevant and noisy variables, improving the accuracy of the model; making the ANN structure less complex and easier to understand; and reducing the computational complexity and memory requirements. Furthermore, rule extraction ensures that the MLP training results are easily understandable and transferrable.EThOS - Electronic Theses Online ServiceUniversity of WarwickOverseas Research Students Awards SchemeGBUnited Kingdo
A Review of Modelling and Simulation Methods for Flashover Prediction in Confined Space Fires
Confined space fires are common emergencies in our society. Enclosure size, ventilation, or type and quantity of fuel involved are factors that determine the fire evolution in these situations. In some cases, favourable conditions may give rise to a flashover phenomenon. However, the difficulty of handling this complicated emergency through fire services can have fatal consequences for their staff. Therefore, there is a huge demand for new methods and technologies to tackle this life-threatening emergency. Modelling and simulation techniques have been adopted to conduct research due to the complexity of obtaining a real cases database related to this phenomenon. In this paper, a review of the literature related to the modelling and simulation of enclosure fires with respect to the flashover phenomenon is carried out. Furthermore, the related literature for comparing images from thermal cameras with computed images is reviewed. Finally, the suitability of artificial intelligence (AI) techniques for flashover prediction in enclosed spaces is also surveyed.This work has been partially funded by the Spanish Government TIN2017-89069-R grant supported with Feder funds. This work was supported in part by the Spanish Ministry of Science, Innovation and Universities through the Project ECLIPSE-UA under Grant RTI2018-094283-B-C32 and the Lucentia AGI Grant
Computational intelligence techniques for missing data imputation
Despite considerable advances in missing data imputation techniques over the last three decades, the
problem of missing data remains largely unsolved. Many techniques have emerged in the literature
as candidate solutions, including the Expectation Maximisation (EM), and the combination of autoassociative
neural networks and genetic algorithms (NN-GA). The merits of both these techniques
have been discussed at length in the literature, but have never been compared to each other. This
thesis contributes to knowledge by firstly, conducting a comparative study of these two techniques..
The significance of the difference in performance of the methods is presented. Secondly, predictive
analysis methods suitable for the missing data problem are presented. The predictive analysis in
this problem is aimed at determining if data in question are predictable and hence, to help in
choosing the estimation techniques accordingly. Thirdly, a novel treatment of missing data for online
condition monitoring problems is presented. An ensemble of three autoencoders together with
hybrid Genetic Algorithms (GA) and fast simulated annealing was used to approximate missing
data. Several significant insights were deduced from the simulation results. It was deduced that for
the problem of missing data using computational intelligence approaches, the choice of optimisation
methods plays a significant role in prediction. Although, it was observed that hybrid GA and Fast
Simulated Annealing (FSA) can converge to the same search space and to almost the same values
they differ significantly in duration. This unique contribution has demonstrated that a particular
interest has to be paid to the choice of optimisation techniques and their decision boundaries.
iii
Another unique contribution of this work was not only to demonstrate that a dynamic programming
is applicable in the problem of missing data, but to also show that it is efficient in addressing the
problem of missing data. An NN-GA model was built to impute missing data, using the principle
of dynamic programing. This approach makes it possible to modularise the problem of missing
data, for maximum efficiency. With the advancements in parallel computing, various modules of
the problem could be solved by different processors, working together in parallel. Furthermore, a
method for imputing missing data in non-stationary time series data that learns incrementally even
when there is a concept drift is proposed. This method works by measuring the heteroskedasticity
to detect concept drift and explores an online learning technique. New direction for research, where
missing data can be estimated for nonstationary applications are opened by the introduction of this
novel method. Thus, this thesis has uniquely opened the doors of research to this area. Many
other methods need to be developed so that they can be compared to the unique existing approach
proposed in this thesis.
Another novel technique for dealing with missing data for on-line condition monitoring problem was
also presented and studied. The problem of classifying in the presence of missing data was addressed,
where no attempts are made to recover the missing values. The problem domain was then extended
to regression. The proposed technique performs better than the NN-GA approach, both in accuracy
and time efficiency during testing. The advantage of the proposed technique is that it eliminates
the need for finding the best estimate of the data, and hence, saves time. Lastly, instead of using
complicated techniques to estimate missing values, an imputation approach based on rough sets is
explored. Empirical results obtained using both real and synthetic data are given and they provide a
valuable and promising insight to the problem of missing data. The work, has significantly confirmed
that rough sets can be reliable for missing data estimation in larger and real databases
Development of an integrated decision support system for supporting offshore oil spill response in harsh environments
Offshore oil spills can lead to significantly negative impacts on socio-economy and constitute a direct hazard to the marine environment and human health. The response to an oil spill usually consists of a series of dynamic, time-sensitive, multi-faceted and complex processes subject to various constraints and challenges. In the past decades, many models have been developed mainly focusing on individual processes including oil weathering simulation, impact assessment, and clean-up optimization. However, to date, research on integration of offshore oil spill vulnerability analysis, process simulation and operation optimization is still lacking. Such deficiency could be more influential in harsh environments. It becomes noticeably critical and urgent to develop new methodologies and improve technical capacities of offshore oil spill responses. Therefore, this proposed research aims at developing an integrated decision support system for supporting offshore oil spill responses especially in harsh environments (DSS-OSRH). Such a DSS consists of offshore oil spill vulnerability analysis, response technologies screening, and simulation-optimization coupling. The uncertainties and/or dynamics have been quantitatively reflected throughout the modeling processes.
First, a Monte Carlo simulation based two-stage adaptive resonance theory mapping (MC-TSAM) approach has been developed. A real-world case study was applied for offshore oil spill vulnerability index (OSVI) classification in the south coast of Newfoundland to demonstrate this approach. Furthermore, a Monte Carlo simulation based integrated rule-based fuzzy adaptive resonance theory mapping (MC-IRFAM) approach has been developed for screening and ranking for spill response and clean-up technologies. The feasibility of the MC-IRFAM was tested with a case of screening and ranking response technologies in an offshore oil spill event. A novel Monte Carlo simulation based dynamic mixed integer nonlinear programming (MC-DMINP) approach has also been developed for the simulation-optimization coupling in offshore oil spill responses. To demonstrate this approach, a case study was conducted in device allocation and oil recovery in an offshore oil spill event. Finally, the DSS-OSRH has been developed based on the integration of MC-TSAM, MC-IRFAM, AND MC-DSINP. To demonstrate its feasibility, a case study was conducted in the decision support during offshore oil spill response in the south coast of Newfoundland.
The developed approaches and DSS are the first of their kinds to date targeting offshore oil spill responses. The novelty can be reflected from the following aspects: 1) an innovative MC-TSAM approach for offshore OSVI classification under complexity and uncertainty; 2) a new MC-IRFAM approach for oil spill response technologies classification and ranking with uncertain information; 3) a novel MC-DMINP simulation-optimization coupling approach for offshore oil spill response operation and resource allocation under uncertainty; and 4) an innovational DSS-OSRH which consists of the MC-TSAM, MC-IRFAM, MC-DMINP, supporting decision making throughout the offshore oil spill response processes. These methods are particularly suitable for offshore oil spill responses in harsh environments such as the offshore areas of Newfoundland and Labrador (NL). The research will also promote the understanding of the processes of oil transport and fate and the impacts to the affected offshore and shoreline area. The methodologies will be capable of providing modeling tools for other related areas that require timely and effective decisions under complexity and uncertainty
Three-dimensional hydrodynamic models coupled with GIS-based neuro-fuzzy classification for assessing environmental vulnerability of marine cage aquaculture
There is considerable opportunity to develop new modelling techniques within a
Geographic Information Systems (GIS) framework for the development of sustainable
marine cage culture. However, the spatial data sets are often uncertain and incomplete,
therefore new spatial models employing “soft computing” methods such as fuzzy logic
may be more suitable.
The aim of this study is to develop a model using Neuro-fuzzy techniques in a 3D GIS
(Arc View 3.2) to predict coastal environmental vulnerability for Atlantic salmon cage
aquaculture. A 3D hydrodynamic model (3DMOHID) coupled to a particle-tracking
model is applied to study the circulation patterns, dispersion processes and residence
time in Mulroy Bay, Co. Donegal Ireland, an Irish fjard (shallow fjordic system), an
area of restricted exchange, geometrically complicated with important aquaculture
activities.
The hydrodynamic model was calibrated and validated by comparison with sea surface
and water flow measurements. The model provided spatial and temporal information on
circulation, renewal time, helping to determine the influence of winds on circulation
patterns and in particular the assessment of the hydrographic conditions with a strong
influence on the management of fish cage culture.
The particle-tracking model was used to study the transport and flushing processes.
Instantaneous massive releases of particles from key boxes are modelled to analyse the
ocean-fjord exchange characteristics and, by emulating discharge from finfish cages, to
show the behaviour of waste in terms of water circulation and water exchange.
In this study the results from the hydrodynamic model have been incorporated into GIS
to provide an easy-to-use graphical user interface for 2D (maps), 3D and temporal
visualization (animations), for interrogation of results.
v
Data on the physical environment and aquaculture suitability were derived from a 3-
dimensional hydrodynamic model and GIS for incorporation into the final model
framework and included mean and maximum current velocities, current flow quiescence
time, water column stratification, sediment granulometry, particulate waste dispersion
distance, oxygen depletion, water depth, coastal protection zones, and slope.
The Neuro-fuzzy classification model NEFCLASS–J, was used to develop learning
algorithms to create the structure (rule base) and the parameters (fuzzy sets) of a fuzzy
classifier from a set of classified training data. A total of 42 training sites were sampled
using stratified random sampling from the GIS raster data layers, and the vulnerability
categories for each were manually classified into four categories based on the opinions
of experts with field experience and specific knowledge of the environmental problems
investigated.
The final products, GIS/based Neuro Fuzzy maps were achieved by combining modeled
and real environmental parameters relevant to marine fin fish Aquaculture.
Environmental vulnerability models, based on Neuro-fuzzy techniques, showed
sensitivity to the membership shapes of the fuzzy sets, the nature of the weightings
applied to the model rules, and validation techniques used during the learning and
validation process. The accuracy of the final classifier selected was R=85.71%,
(estimated error value of ±16.5% from Cross Validation, N=10) with a Kappa
coefficient of agreement of 81%. Unclassified cells in the whole spatial domain (of
1623 GIS cells) ranged from 0% to 24.18 %.
A statistical comparison between vulnerability scores and a significant product of
aquaculture waste (nitrogen concentrations in sediment under the salmon cages) showed
that the final model gave a good correlation between predicted environmental
vi
vulnerability and sediment nitrogen levels, highlighting a number of areas with variable
sensitivity to aquaculture.
Further evaluation and analysis of the quality of the classification was achieved and the
applicability of separability indexes was also studied. The inter-class separability
estimations were performed on two different training data sets to assess the difficulty of
the class separation problem under investigation. The Neuro-fuzzy classifier for a
supervised and hard classification of coastal environmental vulnerability has
demonstrated an ability to derive an accurate and reliable classification into areas of
different levels of environmental vulnerability using a minimal number of training sets.
The output will be an environmental spatial model for application in coastal areas
intended to facilitate policy decision and to allow input into wider ranging spatial
modelling projects, such as coastal zone management systems and effective
environmental management of fish cage aquaculture
Computational Optimizations for Machine Learning
The present book contains the 10 articles finally accepted for publication in the Special Issue “Computational Optimizations for Machine Learning” of the MDPI journal Mathematics, which cover a wide range of topics connected to the theory and applications of machine learning, neural networks and artificial intelligence. These topics include, among others, various types of machine learning classes, such as supervised, unsupervised and reinforcement learning, deep neural networks, convolutional neural networks, GANs, decision trees, linear regression, SVM, K-means clustering, Q-learning, temporal difference, deep adversarial networks and more. It is hoped that the book will be interesting and useful to those developing mathematical algorithms and applications in the domain of artificial intelligence and machine learning as well as for those having the appropriate mathematical background and willing to become familiar with recent advances of machine learning computational optimization mathematics, which has nowadays permeated into almost all sectors of human life and activity
Multi-scale Fire Modelling of Combustible Building Materials
The utilisation of lightweight polymers in building materials has come under tremendous scrutiny, driven by the numerous high-profile fire incidents (e.g., Grenfell Tower UK, 2017) and heightened public awareness of highly combustible materials in the past decade. Consequently, this creates significant interest in developing robust numerical tools to effectively assess the fire behaviours and toxicity of these combustible materials and establish safe use guidelines. In this dissertation, a modelling framework has been developed incorporating multi-scale computational techniques that capture and couple the thermal degradation and combustion characteristics of building materials. This includes (i) characterisation of essential pyrolysis kinetics from thermogravimetric analysis (TGA) via machine learning aided algorithm; (ii) in-depth pyrolysis breakdown from molecular dynamics (MD) simulations coupled with reactive force fields (ReaxFF); and (iii) Computational Fluid Dynamics (CFD) pyrolysis model involving char formation, moving boundary surface tracking and gas-phase combustion considering detailed chemical reaction mechanisms and soot particle formation.
The framework was adopted to assess the fire performance of a selection of FR/non-FR building materials. For the first time, the composition of char formations for the selective polymers was predicted by the MD simulation by analysing the accumulation of pure carbon chain compounds. The extracted pyrolysis kinetics achieved accurate fits with the experimental data. Furthermore, the application of MD allowed the characterisation of the full distribution of volatile and toxic gas species without substantial prior knowledge or experimental testing. The realised pyrolysis inputs were applied in the CFD model for cone calorimeter simulations, which yielded good agreement with experiments in terms of heat release, ignition time and burning duration. With the incorporation of solid interface tracking and char formation, the model was able to predict the thermal degrading solid surface and capture the prolonged burn duration. The char formation acts as a thermal layer to protect the unburnt virgin material from heat penetration during the pyrolysis process. Furthermore, with the application of detailed chemical kinetics for combustion and soot formation reaction mechanisms, the fire model was able to aptly predict the generation of asphyxiant gas such as CO and CO2 during the burning process