Search CORE

1,240 research outputs found

Improving binary classification using filtering based on k-NN proximity graphs

Author: Abbod Maysam F.
Ala’raj Maher
Majdalawieh Munir
Publication venue: ZU Scholars
Publication date: 01/12/2020
Field of study

© 2020, The Author(s). One of the ways of increasing recognition ability in classification problem is removing outlier entries as well as redundant and unnecessary features from training set. Filtering and feature selection can have large impact on classifier accuracy and area under the curve (AUC), as noisy data can confuse classifier and lead it to catch wrong patterns in training data. The common approach in data filtering is using proximity graphs. However, the problem of the optimal filtering parameters selection is still insufficiently researched. In this paper filtering procedure based on k-nearest neighbours proximity graph was used. Filtering parameters selection was adopted as the solution of outlier minimization problem: k-NN proximity graph, power of distance and threshold parameters are selected in order to minimize outlier percentage in training data. Then performance of six commonly used classifiers (Logistic Regression, Naïve Bayes, Neural Network, Random Forest, Support Vector Machine and Decision Tree) and one heterogeneous classifiers combiner (DES-LA) are compared with and without filtering. Dynamic ensemble selection (DES) systems work by estimating the level of competence of each classifier from a pool of classifiers. Only the most competent ones are selected to classify a given test sample. This is achieved by defining a criterion to measure the level of competence of base classifiers, such as, its accuracy in local regions of the feature space around the query instance. In our case the combiner is based on the local accuracy of single classifiers and its output is a linear combination of single classifiers ranking. As results of filtering, accuracy of DES-LA combiner shows big increase for low-accuracy datasets. But filtering doesn’t have sufficient impact on DES-LA performance while working with high-accuracy datasets. The results are discussed, and classifiers, which performance was highly affected by pre-processing filtering step, are defined. The main contribution of the paper is introducing modifications to the DES-LA combiner, as well as comparative analysis of filtering impact on the classifiers of various type. Testing the filtering algorithm on real case dataset (Taiwan default credit card dataset) confirmed the efficiency of automatic filtering approach

ZU Scholars (Zayed University)

Recommended from our members

A credit scoring model based on classifiers consensus system approach

Author: Ala'raj Maher A
Publication venue: Brunel University London
Publication date: 01/01/2016
Field of study

This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University London.Managing customer credit is an important issue for each commercial bank; therefore, banks take great care when dealing with customer loans to avoid any improper decisions that can lead to loss of opportunity or financial losses. The manual estimation of customer creditworthiness has become both time- and resource-consuming. Moreover, a manual approach is subjective (dependable on the bank employee who gives this estimation), which is why devising and implementing programming models that provide loan estimations is the only way of eradicating the ‘human factor’ in this problem. This model should give recommendations to the bank in terms of whether or not a loan should be given, or otherwise can give a probability in relation to whether the loan will be returned. Nowadays, a number of models have been designed, but there is no ideal classifier amongst these models since each gives some percentage of incorrect outputs; this is a critical consideration when each percent of incorrect answer can mean millions of dollars of losses for large banks. However, the LR remains the industry standard tool for credit-scoring models development. For this purpose, an investigation is carried out on the combination of the most efficient classifiers in credit-scoring scope in an attempt to produce a classifier that exceeds each of its classifiers or components. In this work, a fusion model referred to as ‘the Classifiers Consensus Approach’ is developed, which gives a lot better performance than each of single classifiers that constitute it. The difference of the consensus approach and the majority of other combiners lie in the fact that the consensus approach adopts the model of real expert group behaviour during the process of finding the consensus (aggregate) answer. The consensus model is compared not only with single classifiers, but also with traditional combiners and a quite complex combiner model known as the ‘Dynamic Ensemble Selection’ approach. As a pre-processing technique, step data-filtering (select training entries which fits input data well and remove outliers and noisy data) and feature selection (remove useless and statistically insignificant features which values are low correlated with real quality of loan) are used. These techniques are valuable in significantly improving the consensus approach results. Results clearly show that the consensus approach is statistically better (with 95% confidence value, according to Friedman test) than any other single classifier or combiner analysed; this means that for similar datasets, there is a 95% guarantee that the consensus approach will outperform all other classifiers. The consensus approach gives not only the best accuracy, but also better AUC value, Brier score and H-measure for almost all datasets investigated in this thesis. Moreover, it outperformed Logistic Regression. Thus, it has been proven that the use of the consensus approach for credit-scoring is justified and recommended in commercial banks. Along with the consensus approach, the dynamic ensemble selection approach is analysed, the results of which show that, under some conditions, the dynamic ensemble selection approach can rival the consensus approach. The good sides of dynamic ensemble selection approach include its stability and high accuracy on various datasets. The consensus approach, which is improved in this work, may be considered in banks that hold the same characteristics of the datasets used in this work, where utilisation could decrease the level of mistakenly rejected loans of solvent customers, and the level of mistakenly accepted loans that are never to be returned. Furthermore, the consensus approach is a notable step in the direction of building a universal classifier that can fit data with any structure. Another advantage of the consensus approach is its flexibility; therefore, even if the input data is changed due to various reasons, the consensus approach can be easily re-trained and used with the same performance

Brunel University Research Archive

One-Class Classification: Taxonomy of Study and Review of Techniques

Author: Khan Shehroz S.
Madden Michael G.
Publication venue: 'Cambridge University Press (CUP)'
Publication date: 29/11/2013
Field of study

One-class classification (OCC) algorithms aim to build classification models when the negative class is either absent, poorly sampled or not well defined. This unique situation constrains the learning of efficient classifiers by defining class boundary just with the knowledge of positive class. The OCC problem has been considered and applied under many research themes, such as outlier/novelty detection and concept learning. In this paper we present a unified view of the general problem of OCC by presenting a taxonomy of study for OCC problems, which is based on the availability of training data, algorithms used and the application domains applied. We further delve into each of the categories of the proposed taxonomy and present a comprehensive literature review of the OCC algorithms, techniques and methodologies with a focus on their significance, limitations and applications. We conclude our paper by discussing some open research problems in the field of OCC and present our vision for future research.Comment: 24 pages + 11 pages of references, 8 figure

arXiv.org e-Print Archive

Access to Research at National University of Ireland, Galway

End-to-end neural network architecture for fraud scoring in card payments

Author: Alonso-Gadi
Amari
Bahnsen
Bahnsen
Bengio
Bishop
Brause
Brause
Chawla
Collobert
de Boer
Dorronsoro
Dugas
Ghosh
Glorot
Hand
Haykin
Hochreiter
Jon Ander Gómez
Jordan
Jordi Nin
Juan Arévalo
Kou
Kumar
LeCun
Li
Maas
Nair
Roberto Paredes
Rumelhart
Sahin
Save
Viola
Vlasselaer
Whitrow
Publication venue: 'Elsevier BV'
Publication date: 01/01/2018
Field of study

[EN] Millions of euros are lost every year due to fraudulent card transactions. The design and implementation of efficient fraud detection methods is mandatory to minimize such losses. In this paper, we present a neural network based system for fraud detection in banking systems. We use a real world dataset, and describe an end-to-end solution from the practitioner's perspective, by focusing on the following crucial aspects: unbalancedness, data processing and cost metric evaluation. Our analysis shows that the proposed solution achieves comparable performance values with state-of-the-art proprietary and costly solutions. (c) 2017 Elsevier B.V. All rights reserved.Gomez, J.; Arévalo, J.; Paredes Palacios, R.; Nin, J. (2018). End-to-end neural network architecture for fraud scoring in card payments. Pattern Recognition Letters. 105:175-181. https://doi.org/10.1016/j.patrec.2017.08.024S17518110

Crossref

RiuNet

Recommended from our members

Improving binary classification using filtering based on k-NN proximity graphs

Author: Abbod MF
Ala’raj M
Majdalawieh M
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 05/03/2020
Field of study

One of the ways of increasing recognition ability in classification problem is removing outlier entries as well as redundant and unnecessary features from training set. Filtering and feature selection can have large impact on classifier accuracy and area under the curve (AUC), as noisy data can confuse classifier and lead it to catch wrong patterns in training data. The common approach in data filtering is using proximity graphs. However, the problem of the optimal filtering parameters selection is still insufficiently researched. In this paper filtering procedure based on k-nearest neighbours proximity graph was used. Filtering parameters selection was adopted as the solution of outlier minimization problem: k-NN proximity graph, power of distance and threshold parameters are selected in order to minimize outlier percentage in training data. Then performance of six commonly used classifiers (Logistic Regression, Naïve Bayes, Neural Network, Random Forest, Support Vector Machine and Decision Tree) and one heterogeneous classifiers combiner (DES-LA) are compared with and without filtering. Dynamic ensemble selection (DES) systems work by estimating the level of competence of each classifier from a pool of classifiers. Only the most competent ones are selected to classify a given test sample. This is achieved by defining a criterion to measure the level of competence of base classifiers, such as, its accuracy in local regions of the feature space around the query instance. In our case the combiner is based on the local accuracy of single classifiers and its output is a linear combination of single classifiers ranking. As results of filtering, accuracy of DES-LA combiner shows big increase for low-accuracy datasets. But filtering doesn’t have sufficient impact on DES-LA performance while working with high-accuracy datasets. The results are discussed, and classifiers, which performance was highly affected by pre-processing filtering step, are defined. The main contribution of the paper is introducing modifications to the DES-LA combiner, as well as comparative analysis of filtering impact on the classifiers of various type. Testing the filtering algorithm on real case dataset (Taiwan default credit card dataset) confirmed the efficiency of automatic filtering approach

Brunel University Research Archive

Identifying hidden contexts

Author: Zliobaite Indre
Publication venue: Springer LNAI
Publication date: 24/05/2011
Field of study

In this study we investigate how to identify hidden contexts from the data in classification tasks. Contexts are artifacts in the data, which do not predict the class label directly. For instance, in speech recognition task speakers might have different accents, which do not directly discriminate between the spoken words. Identifying hidden contexts is considered as data preprocessing task, which can help to build more accurate classifiers, tailored for particular contexts and give an insight into the data structure. We present three techniques to identify hidden contexts, which hide class label information from the input data and partition it using clustering techniques. We form a collection of performance measures to ensure that the resulting contexts are valid. We evaluate the performance of the proposed techniques on thirty real datasets. We present a case study illustrating how the identified contexts can be used to build specialized more accurate classifiers

Bournemouth University Research Online

Improved power transformer condition monitoring under uncertainty through soft computing and probabilistic health index

Author: Aizpurua J. I.
Catterson V. M.
Cross J. G.
Lambert B.
McArthur S. D. J.
Stewart B. G.
Publication venue: 'Elsevier BV'
Publication date: 08/06/2019
Field of study

Condition monitoring of power transformers is crucial for the reliable and cost-effective operation of the power grid. The health index (HI) formulation is a pragmatic approach to combine multiple information sources and generate a consistent health state indicator for asset management planning. Generally, existing transformer HI methods are based on expert knowledge or data-driven models of specific transformer subsystems. However, the effect of uncertainty is not considered when integrating expert knowledge and data-driven models for the system-levelHI estimation. With the increased dynamic and non-deterministic engineering problems, the sources of uncertainty are increasing across power and energy applications, e.g. electric vehicles with new dynamic loads or nuclear power plants with de-energized periods, and transformer health assessment under uncertainty is becoming critical for accurate condition monitoring. In this context, this paper presents a novel soft computing driven probabilistic HI framework for transformer health monitoring. The approach encapsulates data analytics and expert knowledge along with different sources of uncertainty and infers a transformer HI value with confidence intervals for decision-making under uncertainty. Using real data from a nuclear power plant, the proposed framework is compared with traditional HI implementations and results confirm the validity of the approach for transformer health assessment

University of Strathclyde Institutional Repository

Financial distress prediction using the hybrid associative memory with translation

Author: Cleofás Sánchez Laura
García Vicente
Marqués Marzal Ana Isabel
Sánchez Garreta Josep Salvador
Publication venue: 'Elsevier BV'
Publication date: 01/01/2016
Field of study

This paper presents an alternative technique for financial distress prediction systems. The method is based on a type of neural network, which is called hybrid associative memory with translation. While many different neural network architectures have successfully been used to predict credit risk and corporate failure, the power of associative memories for financial decision-making has not been explored in any depth as yet. The performance of the hybrid associative memory with translation is compared to four traditional neural networks, a support vector machine and a logistic regression model in terms of their prediction capabilities. The experimental results over nine real-life data sets show that the associative memory here proposed constitutes an appropriate solution for bankruptcy and credit risk prediction, performing significantly better than the rest of models under class imbalance and data overlapping conditions in terms of the true positive rate and the geometric mean of true positive and true negative rates.This work has partially been supported by the Mexican CONACYT through the Postdoctoral Fellowship Program [232167], the Spanish Ministry of Economy [TIN2013-46522-P], the Generalitat Valenciana [PROMETEOII/2014/062] and the Mexican PRODEP [DSA/103.5/15/7004]. We would like to thank the Reviewers for their valuable comments and suggestions, which have helped to improve the quality of this paper substantially

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Repositori Institucional de la Universitat Jaume I

A literature review on the application of evolutionary computing to credit scoring

Author: García Jiménez Vicente
Marqués Marzal Ana Isabel
Sánchez Garreta Josep Salvador
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

The last years have seen the development of many credit scoring models for assessing the creditworthiness of loan applicants. Traditional credit scoring methodology has involved the use of statistical and mathematical programming techniques such as discriminant analysis, linear and logistic regression, linear and quadratic programming, or decision trees. However, the importance of credit grant decisions for financial institutions has caused growing interest in using a variety of computational intelligence techniques. This paper concentrates on evolutionary computing, which is viewed as one of the most promising paradigms of computational intelligence. Taking into account the synergistic relationship between the communities of Economics and Computer Science, the aim of this paper is to summarize the most recent developments in the application of evolutionary algorithms to credit scoring by means of a thorough review of scientific articles published during the period 2000–2012.This work has partially been supported by the Spanish Ministry of Education and Science under grant TIN2009-14205 and the Generalitat Valenciana under grant PROMETEO/2010/028

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Repositori Institucional de la Universitat Jaume I