Search CORE

285 research outputs found

Linear and Order Statistics Combiners for Pattern Classification

Author: Ghosh Joydeep
Tumer Kagan
Publication venue
Publication date: 01/01/1999
Field of study

Several researchers have experimentally shown that substantial improvements can be obtained in difficult pattern recognition problems by combining or integrating the outputs of multiple classifiers. This chapter provides an analytical framework to quantify the improvements in classification results due to combining. The results apply to both linear combiners and order statistics combiners. We first show that to a first order approximation, the error rate obtained over and above the Bayes error rate, is directly proportional to the variance of the actual decision boundaries around the Bayes optimum boundary. Combining classifiers in output space reduces this variance, and hence reduces the "added" error. If N unbiased classifiers are combined by simple averaging, the added error rate can be reduced by a factor of N if the individual errors in approximating the decision boundaries are uncorrelated. Expressions are then derived for linear combiners which are biased or correlated, and the effect of output correlations on ensemble performance is quantified. For order statistics based non-linear combiners, we derive expressions that indicate how much the median, the maximum and in general the ith order statistic can improve classifier performance. The analysis presented here facilitates the understanding of the relationships among error rates, classifier boundary distributions, and combining in output space. Experimental results on several public domain data sets are provided to illustrate the benefits of combining and to support the analytical results.Comment: 31 page

arXiv.org e-Print Archive

CiteSeerX

Analysis of the Correlation Between Majority Voting Error and the Diversity Measures in Multiple Classifier Systems

Author: Gabrys Bogdan
Ruta Dymitr
Publication venue: ICSC-NAISO Academic Press
Publication date: 01/01/2001
Field of study

Combining classifiers by majority voting (MV) has recently emerged as an effective way of improving performance of individual classifiers. However, the usefulness of applying MV is not always observed and is subject to distribution of classification outputs in a multiple classifier system (MCS). Evaluation of MV errors (MVE) for all combinations of classifiers in MCS is a complex process of exponential complexity. Reduction of this complexity can be achieved provided the explicit relationship between MVE and any other less complex function operating on classifier outputs is found. Diversity measures operating on binary classification outputs (correct/incorrect) are studied in this paper as potential candidates for such functions. Their correlation with MVE, interpreted as the quality of a measure, is thoroughly investigated using artificial and real-world datasets. Moreover, we propose new diversity measure efficiently exploiting information coming from the whole MCS, rather than its part, for which it is applied

CiteSeerX

Bournemouth University Research Online

Recommended from our members

Predicting business failure using artificial intelligence system

Author: Allozi Yaser
Publication venue: Brunel University London
Publication date: 01/01/2021
Field of study

This thesis was submitted for the award of Doctor of Philosophy and was awarded by Brunel University LondonPredicting business insolvency is considered one of the main supportive sources of information for decision making for financial institutions, investors, creditors, and other participants in the business market. Financial reporting systems provide relevant information that can be used to assess the financial position of firms. It is crucial to have classification and prediction models that can analyse this financial information and provide accurate assurance for users about business health. Recent studies have explored the use of machine learning tools as substitute for traditional statistical methods to develop classification models to classify firm insolvency according to financial statement information. However, these models have no ideal classifier, since each provides a certain percentage of wrong outputs, which is a crucial consideration; every percentage of wrong response can mean massive financial losses for stakeholders. Therefore, this study proposes new insolvency classification and perdition models based on machine learning modelling techniques to develop an improved classifier. Individual modelling techniques using statistical methods and machine learning were used to develop the classification model of business insolvency. The results showed that machine learning method outperformed statistical methods. Deep Learning (DPL) achieved the highest performance based on all performance measurements used in the study, and it was the best individual classifier, with average accuracy of 97.2% using all-years dataset. Ensemble- Boosted Decision Tree classifier ranked second, followed by Decision Tree classifier. Thus, it has been proven that DPL modelling approach is useful for business insolvency classification. A key contribution in enhancing individual classifier outputs is the use of traditional combining methods with two new aggregation methods in business insolvency (Fuzzy Logic and Consensus Approach). The Consensus Approach showed the best improvement in the results of all individual classifiers with average accuracy of 97.7%, and it is considered the best classification method not only in comparison with individual classifiers, but also with traditional combiners. This study pioneers the development of a time series business insolvency prediction model with Big Data for UK businesses. The aim of the model is to provide early prediction about a business health. Three prediction models were developed based on Nonlinear Autoregressive with Exogenous Input models (NARX), Nonlinear Autoregressive Neural Network (NAR), and Deep Learning Time-series model (DPL-SA) and achieved average accuracy rates of 83.6%, 89.5%, and 91.35%, respectively. The results show relatively high performance in comparison with the best individual classifier (deep learning)

Brunel University Research Archive

Multiple classifiers in biometrics. part 1: Fundamentals and review

Author: Camacho David
Fiérrez Julián
Morales Aythami
Vera-Rodriguez Rubén
Publication venue: 'Elsevier BV'
Publication date: 01/01/2017
Field of study

We provide an introduction to Multiple Classifier Systems (MCS) including basic nomenclature and describing key elements: classifier dependencies, type of classifier outputs, aggregation procedures, architecture, and types of methods. This introduction complements other existing overviews of MCS, as here we also review the most prevalent theoretical framework for MCS and discuss theoretical developments related to MCS The introduction to MCS is then followed by a review of the application of MCS to the particular field of multimodal biometric person authentication in the last 25 years, as a prototypical area in which MCS has resulted in important achievements. This review includes general descriptions of successful MCS methods and architectures in order to facilitate the export of them to other information fusion problems. Based on the theory and framework introduced here, in the companion paper we then develop in more technical detail recent trends and developments in MCS from multimodal biometrics that incorporate context information in an adaptive way. These new MCS architectures exploit input quality measures and pattern-specific particularities that move apart from general population statistics, resulting in robust multimodal biometric systems. Similarly as in the present paper, methods in the companion paper are introduced in a general way so they can be applied to other information fusion problems as well. Finally, also in the companion paper, we discuss open challenges in biometrics and the role of MCS to advance themThis work was funded by projects CogniMetrics (TEC2015-70627-R) from MINECO/FEDER and RiskTrakc (JUST-2015-JCOO-AG-1). Part of thisthis work was conducted during a research visit of J.F. to Prof. Ludmila Kuncheva at Bangor University (UK) with STSM funding from COST CA16101 (MULTI-FORESEE

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Biblos-e Archivo

Evolving Ensembles with TPOT

Author: Betancourt Camila Andrea Sarmiento
Publication venue
Publication date: 26/01/2023
Field of study

Dissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Data ScienceMachine learning has become popular in recent years as a solution to various problems such as fraud detection, weather prediction, improve diagnosis accuracy, and more. One of its goals is to find the model that best explains the problem. Among the several alternatives on how to accomplish that, significant attention has been laid on the matter of accuracy using stacking ensembles: the objective is to produce a more accurate prediction by combining the predictions of various estimators. This model has often been exhibiting a superior performance in contrast to its single counterparts. Because the process of choosing the best model for a given problem can be time-consuming, a necessity to automatize the machine learning process has emerged. Different tools allow this, including TPOT, a Python library that uses genetic programming to optimize the machine learning process, evolving pipelines randomly created until the best one is found, or a previously fixed maximum number of generations for the given problem is reached. Genetic programming is a field of machine learning that uses evolutionary algorithms to generate new computer programs, and it has been shown successful in quite a few applications. TPOT uses several machine learning algorithms from the Sklearn Python library. It also features some ensembles, such as Random Forest or AdaBoost. Currently, stacking ensembles are not implemented yet on TPOT, and, considering its current accuracy rates, the objective of this thesis is to implement stacking ensembles in TPOT. After we implemented stacking ensembles successfully in TPOT, we performed some experiments with different datasets and noticed that for almost all of them, TPOT has comparable performance to TPOT with stacking ensembles. Also, we observed that, when using the light dictionary version of TPOT, the results of the Stacking configuration improved for two datasets since it used weaker learners

Repositório da Universidade Nova de Lisboa

Local learning for multi-layer, multi-component predictive system

Author: Al-Jubouri Bassma
Gabrys Bogdan
Publication venue: 'Elsevier BV'
Publication date: 01/10/2016
Field of study

This study introduces a new multi-layer multi-component ensemble. The components of this ensemble are trained locally on subsets of features for disjoint sets of data. The data instances are assigned to local regions using the similarity of their features pairwise squared correlation. Many ensemble methods encourage diversity among their base predictors by training them on different subsets of data or different subsets of features. In the proposed architecture the local regions contain disjoint sets of data and for this data only the most similar features are selected. The pairwise squared correlations of the features are used to weight the predictions of the ensemble's models. The proposed architecture has been tested on a number of data sets and its performance was compared to five benchmark algorithms. The results showed that the testing accuracy of the developed architecture is comparable to the rotation forest and is better than the other benchmark algorithms

Elsevier - Publisher Connector

Bournemouth University Research Online