Search CORE

8 research outputs found

Handling minority class problem in threats detection based on heterogeneous ensemble learning approach.

Author: Ahriz Hatem
Eke Hope
Petrovski Andrei
Publication venue: IGI Global
Publication date: 31/07/2020
Field of study

Multiclass problem, such as detecting multi-steps behaviour of Advanced Persistent Threats (APTs) have been a major global challenge, due to their capability to navigates around defenses and to evade detection for a prolonged period of time. Targeted APT attacks present an increasing concern for both cyber security and business continuity. Detecting the rare attack is a classification problem with data imbalance. This paper explores the applications of data resampling techniques, together with heterogeneous ensemble approach for dealing with data imbalance caused by unevenly distributed data elements among classes with our focus on capturing the rare attack. It has been shown that the suggested algorithms provide not only detection capability, but can also classify malicious data traffic corresponding to rare APT attacks

Open Access Institutional Repository at Robert Gordon University

Aggregation of classifiers: a justifiable information granularity approach.

Author: Liew Alan Wee-Chung
Nguyen Tien Thanh
Pedrycz Witold
Pham Xuan Cuong
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 15/03/2017
Field of study

In this paper, we introduced a new approach of combining multiple classifiers in a heterogeneous ensemble system. Instead of using numerical membership values when combining, we constructed interval membership values for each class prediction from the meta-data of observation by using the concept of information granule. In the proposed method, the uncertainty (diversity) of the predictions produced by the base classifiers is quantified by the interval-based information granules. The decision model is then generated by considering both bound and length of the intervals. Extensive experimentation using the UCI datasets has demonstrated the superior performance of our algorithm over other algorithms including six fixed combining methods, one trainable combining method, AdaBoost, bagging, and random subspace

arXiv.org e-Print Archive

Crossref

Open Access Institutional Repository at Robert Gordon University

Combining heterogeneous classifiers via granular prototypes.

Author: Liew Alan Wee-Chung
Nguyen Mai Phuong
Nguyen Tien Thanh
Pedrycz Witold
Pham Xuan Cuong
Publication venue: 'Elsevier BV'
Publication date: 28/09/2018
Field of study

In this study, a novel framework to combine multiple classifiers in an ensemble system is introduced. Here we exploit the concept of information granule to construct granular prototypes for each class on the outputs of an ensemble of base classifiers. In the proposed method, uncertainty in the outputs of the base classifiers on training observations is captured by an interval-based representation. To predict the class label for a new observation, we first determine the distances between the output of the base classifiers for this observation and the class prototypes, then the predicted class label is obtained by choosing the label associated with the shortest distance. In the experimental study, we combine several learning algorithms to build the ensemble system and conduct experiments on the UCI, colon cancer, and selected CLEF2009 datasets. The experimental results demonstrate that the proposed framework outperforms several benchmarked algorithms including two trainable combining methods, i.e., Decision Template and Two Stages Ensemble System, AdaBoost, Random Forest, L2-loss Linear Support Vector Machine, and Decision Tree

Open Access Institutional Repository at Robert Gordon University

Evolving interval-based representation for multiple classifier fusion.

Author: Baghel Vimal Anand
Dang Manh Truong
Liew Alan Wee-Chung
Luong Anh Vu
McCall John
Nguyen Tien Thanh
Publication venue: 'Elsevier BV'
Publication date: 16/05/2020
Field of study

Designing an ensemble of classifiers is one of the popular research topics in machine learning since it can give better results than using each constituent member. Furthermore, the performance of ensemble can be improved using selection or adaptation. In the former, the optimal set of base classifiers, meta-classifier, original features, or meta-data is selected to obtain a better ensemble than using the entire classifiers and features. In the latter, the base classifiers or combining algorithms working on the outputs of the base classifiers are made to adapt to a particular problem. The adaptation here means that the parameters of these algorithms are trained to be optimal for each problem. In this study, we propose a novel evolving combining algorithm using the adaptation approach for the ensemble systems. Instead of using numerical value when computing the representation for each class, we propose to use the interval-based representation for the class. The optimal value of the representation is found through Particle Swarm Optimization. During classification, a test instance is assigned to the class with the interval-based representation that is closest to the base classifiers’ prediction. Experiments conducted on a number of popular dataset confirmed that the proposed method is better than the well-known ensemble systems using Decision Template and Sum Rule as combiner, L2-loss Linear Support Vector Machine, Multiple Layer Neural Network, and the ensemble selection methods based on GA-Meta-data, META-DES, and ACO

Open Access Institutional Repository at Robert Gordon University

Classifiers consensus system approach for credit scoring

Author: Abbod M
Ala'raj M
Publication venue: 'Elsevier BV'
Publication date: 01/07/2016
Field of study

Banks take great care when dealing with customer loans to avoid any improper decisions that can lead to loss of opportunity or financial losses. Regarding this, researchers have developed complex credit scoring models using statistical and artificial intelligence (AI) techniques to help banks and financial institutions to support their financial decisions. Various models, from easy to advanced approaches, have been developed in this domain. However, during the last few years there has been marked attention towards development of ensemble or multiple classifier systems, which have proved their ability to be more accurate than single classifier models. However, among the multiple classifier systems models developed in the literature, there has been little consideration given to: 1) combining classifiers of different algorithms (as most have focused on building classifiers of the same algorithm); or 2) exploring different classifier output combination techniques other than the traditional ones, such as majority voting and weighted average. In this paper, the aim is to present a new combination approach based on classifier consensus to combine multiple classifier systems (MCS) of different classification algorithms. Specifically, six of the main well-known base classifiers in this domain are used, namely, logistic regression (LR), neural networks (NN), support vector machines (SVM), random forests (RF), decision trees (DT) and naïve Bayes (NB). Two benchmark classifiers are considered as a reference point for comparison with the proposed method and the other classifiers. These are used in combination with LR, which is still considered the industry-standard model for credit scoring models, and multivariate adaptive regression splines (MARS), a widely adopted technique in credit scoring studies. The experimental results, analysis and statistical tests demonstrate the ability of the proposed combination method to improve prediction performance against all base classifiers, namely, LR, MARS and seven traditional combination methods, in terms of average accuracy, area under the curve (AUC), the H-measure and Brier score (BS). The model was validated over five real-world credit scoring datasets

Crossref

Brunel University Research Archive

Machine learning ensemble method for discovering knowledge from big data

Author: Farrash Majed
Publication venue
Publication date: 01/01/2016
Field of study

Big data, generated from various business internet and social media activities, has become a big challenge to researchers in the field of machine learning and data mining to develop new methods and techniques for analysing big data effectively and efficiently. Ensemble methods represent an attractive approach in dealing with the problem of mining large datasets because of their accuracy and ability of utilizing the divide-and-conquer mechanism in parallel computing environments. This research proposes a machine learning ensemble framework and implements it in a high performance computing environment. This research begins by identifying and categorising the effects of partitioned data subset size on ensemble accuracy when dealing with very large training datasets. Then an algorithm is developed to ascertain the patterns of the relationship between ensemble accuracy and the size of partitioned data subsets. The research concludes with the development of a selective modelling algorithm, which is an efficient alternative to static model selection methods for big datasets. The results show that maximising the size of partitioned data subsets does not necessarily improve the performance of an ensemble of classifiers that deal with large datasets. Identifying the patterns exhibited by the relationship between ensemble accuracy and partitioned data subset size facilitates the determination of the best subset size for partitioning huge training datasets. Finally, traditional model selection is inefficient in cases wherein large datasets are involved

University of East Anglia digital repository

DECENTRALISED AND PRIVACY-PRESERVING MACHINE LEARNING APPROACH FOR DISTRIBUTED DATA RESOURCES

Author: Alkhozae Mona
Publication venue
Publication date: 01/08/2023
Field of study

The University of Manchester - Institutional Repository

Recommended from our members

A credit scoring model based on classifiers consensus system approach

Author: Ala'raj Maher A
Publication venue: Brunel University London
Publication date: 01/01/2016
Field of study

This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University London.Managing customer credit is an important issue for each commercial bank; therefore, banks take great care when dealing with customer loans to avoid any improper decisions that can lead to loss of opportunity or financial losses. The manual estimation of customer creditworthiness has become both time- and resource-consuming. Moreover, a manual approach is subjective (dependable on the bank employee who gives this estimation), which is why devising and implementing programming models that provide loan estimations is the only way of eradicating the ‘human factor’ in this problem. This model should give recommendations to the bank in terms of whether or not a loan should be given, or otherwise can give a probability in relation to whether the loan will be returned. Nowadays, a number of models have been designed, but there is no ideal classifier amongst these models since each gives some percentage of incorrect outputs; this is a critical consideration when each percent of incorrect answer can mean millions of dollars of losses for large banks. However, the LR remains the industry standard tool for credit-scoring models development. For this purpose, an investigation is carried out on the combination of the most efficient classifiers in credit-scoring scope in an attempt to produce a classifier that exceeds each of its classifiers or components. In this work, a fusion model referred to as ‘the Classifiers Consensus Approach’ is developed, which gives a lot better performance than each of single classifiers that constitute it. The difference of the consensus approach and the majority of other combiners lie in the fact that the consensus approach adopts the model of real expert group behaviour during the process of finding the consensus (aggregate) answer. The consensus model is compared not only with single classifiers, but also with traditional combiners and a quite complex combiner model known as the ‘Dynamic Ensemble Selection’ approach. As a pre-processing technique, step data-filtering (select training entries which fits input data well and remove outliers and noisy data) and feature selection (remove useless and statistically insignificant features which values are low correlated with real quality of loan) are used. These techniques are valuable in significantly improving the consensus approach results. Results clearly show that the consensus approach is statistically better (with 95% confidence value, according to Friedman test) than any other single classifier or combiner analysed; this means that for similar datasets, there is a 95% guarantee that the consensus approach will outperform all other classifiers. The consensus approach gives not only the best accuracy, but also better AUC value, Brier score and H-measure for almost all datasets investigated in this thesis. Moreover, it outperformed Logistic Regression. Thus, it has been proven that the use of the consensus approach for credit-scoring is justified and recommended in commercial banks. Along with the consensus approach, the dynamic ensemble selection approach is analysed, the results of which show that, under some conditions, the dynamic ensemble selection approach can rival the consensus approach. The good sides of dynamic ensemble selection approach include its stability and high accuracy on various datasets. The consensus approach, which is improved in this work, may be considered in banks that hold the same characteristics of the datasets used in this work, where utilisation could decrease the level of mistakenly rejected loans of solvent customers, and the level of mistakenly accepted loans that are never to be returned. Furthermore, the consensus approach is a notable step in the direction of building a universal classifier that can fit data with any structure. Another advantage of the consensus approach is its flexibility; therefore, even if the input data is changed due to various reasons, the consensus approach can be easily re-trained and used with the same performance

Brunel University Research Archive