189 research outputs found
Multiobjective Evolutionary Optimization for Prototype-Based Fuzzy Classifiers
Evolving intelligent systems (EISs), particularly, the zero-order ones have demonstrated strong performance on many real-world problems concerning data stream classification, while offering high model transparency and interpretability thanks to their prototype-based nature. Zero-order EISs typically learn prototypes by clustering streaming data online in a âone passâ manner for greater computation efficiency. However, such identified prototypes often lack optimality, resulting in less precise classification boundaries, thereby hindering the potential classification performance of the systems. To address this issue, a commonly adopted strategy is to minimise the training error of the models on historical training data or alternatively, to iteratively minimise the intra-cluster variance of the clusters obtained via online data partitioning. This recognises the fact that the ultimate classification performance of zero-order EISs is driven by the positions of prototypes in the data space. Yet, simply minimising the training error may potentially lead to overfitting, whilst minimising the intra-cluster variance does not necessarily ensure the optimised prototype-based models to attain improved classification outcomes. To achieve better classification performance whilst avoiding overfitting for zero-order EISs, this paper presents a novel multi-objective optimisation approach, enabling EISs to obtain optimal prototypes via involving these two disparate but complementary strategies simultaneously. Five decision-making schemes are introduced for selecting a suitable solution to deploy from the final non-dominated set of the resulting optimised models. Systematic experimental studies are carried out to demonstrate the effectiveness of the proposed optimisation approach in improving the classification performance of zero-order EISs
Multi-Objective Evolutionary Optimisation for Prototype-Based Fuzzy Classifiers
Evolving intelligent systems (EISs), particularly, the zero-order ones have demonstrated strong performance on many real-world problems concerning data stream classification, while offering high model transparency and interpretability thanks to their prototype-based nature. Zero-order EISs typically learn prototypes by clustering streaming data online in a âone passâ manner for greater computation efficiency. However, such identified prototypes often lack optimality, resulting in less precise classification boundaries, thereby hindering the potential classification performance of the systems. To address this issue, a commonly adopted strategy is to minimise the training error of the models on historical training data or alternatively, to iteratively minimise the intra-cluster variance of the clusters obtained via online data partitioning. This recognises the fact that the ultimate classification performance of zero-order EISs is driven by the positions of prototypes in the data space. Yet, simply minimising the training error may potentially lead to overfitting, whilst minimising the intra-cluster variance does not necessarily ensure the optimised prototype-based models to attain improved classification outcomes. To achieve better classification performance whilst avoiding overfitting for zero-order EISs, this paper presents a novel multi-objective optimisation approach, enabling EISs to obtain optimal prototypes via involving these two disparate but complementary strategies simultaneously. Five decision-making schemes are introduced for selecting a suitable solution to deploy from the final non-dominated set of the resulting optimised models. Systematic experimental studies are carried out to demonstrate the effectiveness of the proposed optimisation approach in improving the classification performance of zero-order EISs
A multi-objective optimization approach for the synthesis of granular computing-based classification systems in the graph domain
The synthesis of a pattern recognition system usually aims at the optimization of a given performance index. However, in many real-world scenarios, there exist other desired facets to take into account. In this regard, multi-objective optimization acts as the main tool for the optimization of different (and possibly conflicting) objective functions in order to seek for potential trade-offs among them. In this paper, we propose a three-objective optimization problem for the synthesis of a granular computing-based pattern recognition system in the graph domain. The core pattern recognition engine searches for suitable information granules (i.e., recurrent and/or meaningful subgraphs from the training data) on the top of which the graph embedding procedure towards the Euclidean space is performed. In the latter, any classification system can be employed. The optimization problem aims at jointly optimizing the performance of the classifier, the number of information granules and the structural complexity of the classification model. Furthermore, we address the problem of selecting a suitable number of solutions from the resulting Pareto Fronts in order to compose an ensemble of classifiers to be tested on previously unseen data. To perform such selection, we employed a multi-criteria decision making routine by analyzing different case studies that differ on how much each objective function weights in the ranking process. Results on five open-access datasets of fully labeled graphs show that exploiting the ensemble is effective (especially when the structural complexity of the model plays a minor role in the decision making process) if compared against the baseline solution that solely aims at maximizing the performances
Recommended from our members
Multi-objective community detection applied to social and COVID-19 constructed networks
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University LondonCommunity Detection plays an integral part in network analysis, as it facilitates understanding the structures and functional characteristics of the network. Communities organize real-world networks into densely connected groups of nodes. This thesis provides a critical analysis of the Community Detection and highlights the main areas including algorithms, evaluation metrics, applications, and datasets in social networks.
After defining the research gap, this thesis proposes two Attribute-Based Label Propagation algorithms that maximizes both Modularity and homogeneity. Homogeneity is considered as an objective function one time, and as a constraint another time. To better capture the homogeneity of real-world networks, a new Penalized Homogeneity degree (PHd) is proposed, that can be easily personalized based on the network characteristics.
For the first time, COVID-19 tracing data are utilized to form two dataset networks: one is based on the virus transition between the world countries. While the second dataset is an attributed network based on the virus transition among the contact-tracing in the Kingdom of Bahrain. This type of networks that is concerned in tracking a disease was not formed based on COVID-19 virus and has never been studied as a community detection problem. The proposed datasets are validated and tested in several experiments. The proposed Penalized Homogeneity measure is personalized and used to evaluate the proposed attributed network.
Extensive experiments and analysis are carried out to evaluate the proposed methods and benchmark the results with other well-known algorithms. The results are compared in terms of Modularity, proposed PHd, and accuracy measures. The proposed methods have achieved maximum performance among other methods, with 26.6% better performance in Modularity, and 33.96% in PHd on the proposed dataset, as well as noteworthy results on benchmarking datasets with improvement in Modularity measures of 7.24%, and 4.96% respectively, and proposed PHd values 27% and 81.9%
GERAĂĂO GENĂTICA DE CLASSIFICADORES FUZZY PARA BASES DE DADOS DESBALANCEADAS
Eventos raros, padrĂ”es nĂŁo usuais e comportamentos anormais sĂŁo difĂceis de serem detectados e frequentemente exigem respostas em tempo hĂĄbil (HAIXIANG et al, 2017). Eventos raros se referem aos eventos que acontecem com uma frequĂȘncia muito menor em relação aos eventos comuns (MAALOUF; TRAFALIS, 2011). Exemplos de eventos raros sĂŁo detecção de defeitos em software (RODRIGUEZ et al, 2014), desastres naturais (HAIXIANG et al, 2017), detecção de fraudes em transaçÔes financeiras (PANIGRAHI, S. et al, 2009), dentre outros
Multi-Objective Evolutionary Optimisation for Prototype-Based Fuzzy Classifiers
Evolving intelligent systems (EISs), particularly, the zero-order ones have demonstrated strong performance on many real-world problems concerning data stream classification, while offering high model transparency and interpretability thanks to their prototype-based nature. Zero-order EISs typically learn prototypes by clustering streaming data online in a âone passâ manner for greater computation efficiency. However, such identified prototypes often lack optimality, resulting in less precise classification boundaries, thereby hindering the potential classification performance of the systems. To address this issue, a commonly adopted strategy is to minimise the training error of the models on historical training data or alternatively, to iteratively minimise the intra-cluster variance of the clusters obtained via online data partitioning. This recognises the fact that the ultimate classification performance of zero-order EISs is driven by the positions of prototypes in the data space. Yet, simply minimising the training error may potentially lead to overfitting, whilst minimising the intra-cluster variance does not necessarily ensure the optimised prototype-based models to attain improved classification outcomes. To achieve better classification performance whilst avoiding overfitting for zero-order EISs, this paper presents a novel multi-objective optimisation approach, enabling EISs to obtain optimal prototypes via involving these two disparate but complementary strategies simultaneously. Five decision-making schemes are introduced for selecting a suitable solution to deploy from the final non-dominated set of the resulting optimised models. Systematic experimental studies are carried out to demonstrate the effectiveness of the proposed optimisation approach in improving the classification performance of zero-order EISs
Machine learning for network based intrusion detection: an investigation into discrepancies in findings with the KDD cup '99 data set and multi-objective evolution of neural network classifier ensembles from imbalanced data.
For the last decade it has become commonplace to evaluate machine learning techniques for network based intrusion detection on the KDD Cup '99 data set. This data set has served well to demonstrate that machine learning can be useful in intrusion detection. However, it has undergone some criticism in the literature, and it is out of date. Therefore, some researchers question the validity of the findings reported based on this data set. Furthermore, as identified in this thesis, there are also discrepancies in the findings reported in the literature. In some cases the results are contradictory. Consequently, it is difficult to analyse the current body of research to determine the value in the findings. This thesis reports on an empirical investigation to determine the underlying causes of the discrepancies. Several methodological factors, such as choice of data subset, validation method and data preprocessing, are identified and are found to affect the results significantly. These findings have also enabled a better interpretation of the current body of research. Furthermore, the criticisms in the literature are addressed and future use of the data set is discussed, which is important since researchers continue to use it due to a lack
of better publicly available alternatives. Due to the nature of the intrusion detection domain, there is an extreme imbalance among the classes in the KDD Cup '99 data set, which poses a significant challenge to machine learning. In other domains, researchers have demonstrated that well known techniques such as Artificial Neural Networks (ANNs) and Decision Trees (DTs) often fail to learn the minor class(es) due to class imbalance. However, this has not been recognized as an issue in intrusion detection previously. This thesis reports on an empirical
investigation that demonstrates that it is the class imbalance that causes the poor detection of some classes
of intrusion reported in the literature. An alternative approach to training ANNs is proposed in this thesis, using Genetic Algorithms (GAs) to evolve the weights of the ANNs, referred to as an Evolutionary Neural Network (ENN). When employing evaluation functions that calculate the fitness proportionally to the instances of each class, thereby avoiding a bias towards the major class(es) in the data set, significantly improved true positive rates are obtained
whilst maintaining a low false positive rate. These findings demonstrate that the issues of learning from
imbalanced data are not due to limitations of the ANNs; rather the training algorithm. Moreover, the ENN is capable of detecting a class of intrusion that has been reported in the literature to be undetectable by ANNs. One limitation of the ENN is a lack of control of the classification trade-off the ANNs obtain. This is identified as a general issue with current approaches to creating classifiers. Striving to create a single best classifier that obtains the highest accuracy may give an unfruitful classification trade-off, which is demonstrated clearly in this thesis. Therefore, an extension of the ENN is proposed, using a Multi-Objective
GA (MOGA), which treats the classification rate on each class as a separate objective. This approach produces a Pareto front of non-dominated solutions that exhibit different classification trade-offs, from which the user can select one with the desired properties. The multi-objective approach is also utilised to evolve classifier ensembles, which yields an improved Pareto front of solutions. Furthermore, the selection of classifier members for the ensembles is investigated, demonstrating how this affects the performance of the resultant ensembles. This is a key to explaining why some classifier combinations fail to give fruitful solutions
- âŠ