10 research outputs found

    Credit risk evaluation by using nearest subspace method

    Get PDF
    AbstractIn this paper, a classification method named nearest subspace method is applied for credit risk evaluation. Virtually credit risk evaluation is a very typical classification problem to identify “good” and “bad” creditors. Currently some machine learning technologies, such as support vector machine (SVM), have been discussed widely in credit risk evaluation. But there are many effective classification methods in pattern recognition and artificial intelligence have not been tested for credit evaluation. This paper presents to use nearest subspace classification method, a successful face recognition method, for credit evaluation. The nearest subspace credit evaluation method use the subspaces spanned by the creditors in same class to extend the training set, and the Euclidean distance from a test creditor to the subspace is taken as the similarity measure for classification, then the test creditor belongs to the class of nearest subspace. Experiments on real world credit dataset show that the nearest subspace credit risk evaluation method is a competitive method

    Predictive Modelling of Retail Banking Transactions for Credit Scoring, Cross-Selling and Payment Pattern Discovery

    Get PDF
    Evaluating transactional payment behaviour offers a competitive advantage in the modern payment ecosystem, not only for confirming the presence of good credit applicants or unlocking the cross-selling potential between the respective product and service portfolios of financial institutions, but also to rule out bad credit applicants precisely in transactional payments streams. In a diagnostic test for analysing the payment behaviour, I have used a hybrid approach comprising a combination of supervised and unsupervised learning algorithms to discover behavioural patterns. Supervised learning algorithms can compute a range of credit scores and cross-sell candidates, although the applied methods only discover limited behavioural patterns across the payment streams. Moreover, the performance of the applied supervised learning algorithms varies across the different data models and their optimisation is inversely related to the pre-processed dataset. Subsequently, the research experiments conducted suggest that the Two-Class Decision Forest is an effective algorithm to determine both the cross-sell candidates and creditworthiness of their customers. In addition, a deep-learning model using neural network has been considered with a meaningful interpretation of future payment behaviour through categorised payment transactions, in particular by providing additional deep insights through graph-based visualisations. However, the research shows that unsupervised learning algorithms play a central role in evaluating the transactional payment behaviour of customers to discover associations using market basket analysis based on previous payment transactions, finding the frequent transactions categories, and developing interesting rules when each transaction category is performed on the same payment stream. Current research also reveals that the transactional payment behaviour analysis is multifaceted in the financial industry for assessing the diagnostic ability of promotion candidates and classifying bad credit applicants from among the entire customer base. The developed predictive models can also be commonly used to estimate the credit risk of any credit applicant based on his/her transactional payment behaviour profile, combined with deep insights from the categorised payment transactions analysis. The research study provides a full review of the performance characteristic results from different developed data models. Thus, the demonstrated data science approach is a possible proof of how machine learning models can be turned into cost-sensitive data models

    Literature Review of Credit Card Fraud Detection with Machine Learning

    Get PDF
    This thesis presents a comprehensive examination of the field of credit card fraud detection, aiming to offer a thorough understanding of its evolution and nuances. Through a synthesis of various studies, methodologies, and technologies, this research strives to provide a holistic perspective on the subject, shedding light on both its strengths and limitations. In the realm of credit card fraud detection, a range of methods and combinations have been explored to enhance effectiveness. This research reviews several noteworthy approaches, including Genetic Algorithms (GA) coupled with Random Forest (GA-RF), Decision Trees (GA-DT), and Artificial Neural Networks (GA-ANN). Additionally, the study delves into outlier score definitions, considering different levels of granularity, and their integration into a supervised framework. Moreover, it discusses the utilization of Artificial Neural Networks (ANNs) in federated learning and the incorporation of Generative Adversarial Networks (GANs) with Modified Focal Loss and Random Forest as the base machine learning algorithm. These methods, either independently or in combination, represent some of the most recent developments in credit card fraud detection, showcasing their potential to address the evolving landscape of digital financial threats. The scope of this literature review encompasses a wide range of sources, including research articles, academic papers, and industry reports, spanning multiple disciplines such as computer science, data science, artificial intelligence, and cybersecurity. The review is organized to guide readers through the progression of credit card fraud detection, commencing with foundational concepts and advancing toward the most recent developments. In today's digital financial landscape, the need for robust defense mechanisms against credit card fraud is undeniable. By critically assessing the existing literature, recognizing emerging trends, and evaluating the effectiveness of various detection methods, this thesis aims to contribute to the knowledge pool within the credit card fraud detection domain. The insights gleaned from this comprehensive review will not only benefit researchers and practitioners but also serve as a roadmap for the enhancement of more adaptive and resilient fraud detection systems. As the ongoing battle between fraudsters and defenders in the financial realm continues to evolve, a deep understanding of the current landscape becomes an asset. This literature review aspires to equip readers with the insights needed to address the dynamic challenges associated with credit card fraud detection, fostering innovation and resilience in the pursuit of secure and trustworthy financial transactions

    An insight into the experimental design for credit risk and corporate bankruptcy prediction systems

    Get PDF
    Over the last years, it has been observed an increasing interest of the finance and business communities in any application tool related to the prediction of credit and bankruptcy risk, probably due to the need of more robust decision-making systems capable of managing and analyzing complex data. As a result, plentiful techniques have been developed with the aim of producing accurate prediction models that are able to tackle these issues. However, the design of experiments to assess and compare these models has attracted little attention so far, even though it plays an important role in validating and supporting the theoretical evidence of performance. The experimental design should be done carefully for the results to hold significance; otherwise, it might be a potential source of misleading and contradictory conclusions about the benefits of using a particular prediction system. In this work, we review more than 140 papers published in refereed journals within the period 2000–2013, putting the emphasis on the bases of the experimental design in credit scoring and bankruptcy prediction applications. We provide some caveats and guidelines for the usage of databases, data splitting methods, performance evaluation metrics and hypothesis testing procedures in order to converge on a systematic, consistent validation standard.This work has partially been supported by the Mexican Science and Technology Council (CONACYT-Mexico) through a Postdoctoral Fellowship [223351], the Spanish Ministry of Economy under grant TIN2013-46522-P and the Generalitat Valenciana under grant PROMETEOII/2014/062

    Novel GIS based machine learning algorithms for shallow landslide susceptibility mapping

    Get PDF
    © 2018 by the authors. Licensee MDPI, Basel, Switzerland. The main objective of this research was to introduce a novel machine learning algorithm of alternating decision tree (ADTree) based on the multiboost (MB), bagging (BA), rotation forest (RF) and random subspace (RS) ensemble algorithms under two scenarios of different sample sizes and raster resolutions for spatial prediction of shallow landslides around Bijar City, Kurdistan Province, Iran. The evaluation of modeling process was checked by some statistical measures and area under the receiver operating characteristic curve (AUROC). Results show that, for combination of sample sizes of 60%/40% and 70%/30% with a raster resolution of 10 m, the RS model, while, for 80%/20% and 90%/10% with a raster resolution of 20 m, the MB model obtained a high goodness-of-fit and prediction accuracy. The RS-ADTree and MB-ADTree ensemble models outperformed the ADTree model in two scenarios. Overall, MB-ADTree in sample size of 80%/20% with a resolution of 20 m (area under the curve (AUC) = 0.942) and sample size of 60%/40% with a resolution of 10 m (AUC = 0.845) had the highest and lowest prediction accuracy, respectively. The findings confirm that the newly proposed models are very promising alternative tools to assist planners and decision makers in the task of managing landslide prone areas

    Differential evolution technique on weighted voting stacking ensemble method for credit card fraud detection

    Get PDF
    Differential Evolution is an optimization technique of stochastic search for a population-based vector, which is powerful and efficient over a continuous space for solving differentiable and non-linear optimization problems. Weighted voting stacking ensemble method is an important technique that combines various classifier models. However, selecting the appropriate weights of classifier models for the correct classification of transactions is a problem. This research study is therefore aimed at exploring whether the Differential Evolution optimization method is a good approach for defining the weighting function. Manual and random selection of weights for voting credit card transactions has previously been carried out. However, a large number of fraudulent transactions were not detected by the classifier models. Which means that a technique to overcome the weaknesses of the classifier models is required. Thus, the problem of selecting the appropriate weights was viewed as the problem of weights optimization in this study. The dataset was downloaded from the Kaggle competition data repository. Various machine learning algorithms were used to weight vote a class of transaction. The differential evolution optimization techniques was used as a weighting function. In addition, the Synthetic Minority Oversampling Technique (SMOTE) and Safe Level Synthetic Minority Oversampling Technique (SL-SMOTE) oversampling algorithms were modified to preserve the definition of SMOTE while improving the performance. Result generated from this research study showed that the Differential Evolution Optimization method is a good weighting function, which can be adopted as a systematic weight function for weight voting stacking ensemble method of various classification methods.School of ComputingM. Sc. (Computing

    MevaL: A Visual Machine Learning Model Evaluation Tool for Financial Crime Detection

    Get PDF
    Data Science and Machine Learning are two valuable allies to fight financial crime,the domain where Feedzai seeks to leverage its value proposition in support of its mission:to make banking and commerce safe. Data is at the core of both fields and this domain, sostructuring instances for visual consumption provides an effective way of understandingthe data and communicating insights.The development of a solution for each project and use case requires a careful andeffective Machine Learning Model Evaluation stage, as it is the major source of feedbackbefore deployment. The tooling for this stage available at Feedzai can be improved,accelerated, visually supported, and diversified to enable data scientists to boost theirdaily work and the quality of the models.In this work, I propose to collect and compile internal and external input, in terms ofworkflow and Model Evaluation, in a proposal hierarchically segmented by well-definedobjectives and tasks, to instantiate the proposal in a Python package, and to iteratively val-idate the package with Feedzai’s data scientists. Therefore, the first contribution is MevaL,a Python package for Model Evaluation with visual support, integrated into Feedzai’s DataScience environment by design. In fact, MevaL is already being leveraged as a visualization package on two internal reporting projects that are serving some of Feedzai’s majorclients.In addition to MevaL, the second contribution of this work is the Model EvaluationTopology developed to ensure clear communication and design of features.A Ciência de Dados e a Aprendizagem Automática [277] são duas valiosas aliadas no combate à criminalidade económico-financeira, o domínio em que a Feedzai procura potenciar a sua proposta de valor em prol da sua missão: tornar o sistema bancário e o comércio seguros. Além disso, os dados estão no centro das duas áreas e deste domínio.Assim, a estruturação visual dos mesmos fornece uma maneira eficaz de os entender e transmitir informação.O desenvolvimento de uma solução para cada projeto e caso de uso requer um estágiocuidadoso e eficaz de Avaliação de Modelos de Aprendizagem Automática, pois esteestágio coincide com a principal fonte de retorno (feedback) antes da implementaçãoda solução. As ferramentas de Avaliação de Modelos disponíveis na Feedzai podem seraprimoradas, aceleradas, suportadas visualmente e diversificadas para permitir que oscientistas de dados impulsionem o seu trabalho diário e a qualidade destes modelos.Neste trabalho, proponho a recolha e compilação de informação interna e externa, em termos de fluxo de trabalho e Avaliação de Modelos, numa proposta hierarquicamente segmentada por objetivos e tarefas bem definidas, a instanciação desta proposta num pacote Python e a validação iterativa deste pacote em colaboração com os cientistas de dados da Feedzai. Posto isto, a primeira contribuição deste trabalho é o MevaL, um pacote Python para Avaliação de Modelos com suporte visual, integrado no ambiente de Ciência de Dados da Feedzai. Na verdade, o MevaL já está a ser utilizado como um pacote de visualização em dois projetos internos de preparação de relatórios automáticos para alguns dos principais clientes da Feedzai.Além do MevaL, a segunda contribuição deste trabalho é a Topologia de Avaliação de Modelos desenvolvida para garantir uma comunicação clara e o design enquadrado das diferentes funcionalidades

    AVATAR - Machine Learning Pipeline Evaluation Using Surrogate Model

    Get PDF
    © 2020, The Author(s). The evaluation of machine learning (ML) pipelines is essential during automatic ML pipeline composition and optimisation. The previous methods such as Bayesian-based and genetic-based optimisation, which are implemented in Auto-Weka, Auto-sklearn and TPOT, evaluate pipelines by executing them. Therefore, the pipeline composition and optimisation of these methods requires a tremendous amount of time that prevents them from exploring complex pipelines to find better predictive models. To further explore this research challenge, we have conducted experiments showing that many of the generated pipelines are invalid, and it is unnecessary to execute them to find out whether they are good pipelines. To address this issue, we propose a novel method to evaluate the validity of ML pipelines using a surrogate model (AVATAR). The AVATAR enables to accelerate automatic ML pipeline composition and optimisation by quickly ignoring invalid pipelines. Our experiments show that the AVATAR is more efficient in evaluating complex pipelines in comparison with the traditional evaluation approaches requiring their execution

    Bioelectrical User Authentication

    Get PDF
    There has been tremendous growth of mobile devices, which includes mobile phones, tablets etc. in recent years. The use of mobile phone is more prevalent due to their increasing functionality and capacity. Most of the mobile phones available now are smart phones and better processing capability hence their deployment for processing large volume of information. The information contained in these smart phones need to be protected against unauthorised persons from getting hold of personal data. To verify a legitimate user before accessing the phone information, the user authentication mechanism should be robust enough to meet present security challenge. The present approach for user authentication is cumbersome and fails to consider the human factor. The point of entry mechanism is intrusive which forces users to authenticate always irrespectively of the time interval. The use of biometric is identified as a more reliable method for implementing a transparent and non-intrusive user authentication. Transparent authentication using biometrics provides the opportunity for more convenient and secure authentication over secret-knowledge or token-based approaches. The ability to apply biometrics in a transparent manner improves the authentication security by providing a reliable way for smart phone user authentication. As such, research is required to investigate new modalities that would easily operate within the constraints of a continuous and transparent authentication system. This thesis explores the use of bioelectrical signals and contextual information for non-intrusive approach for authenticating a user of a mobile device. From fusion of bioelectrical signals and context awareness information, three algorithms where created to discriminate subjects with overall Equal Error Rate (EER of 3.4%, 2.04% and 0.27% respectively. Based vii | P a g e on the analysis from the multi-algorithm implementation, a novel architecture is proposed using a multi-algorithm biometric authentication system for authentication a user of a smart phone. The framework is designed to be continuous, transparent with the application of advanced intelligence to further improve the authentication result. With the proposed framework, it removes the inconvenience of password/passphrase etc. memorability, carrying of token or capturing a biometric sample in an intrusive manner. The framework is evaluated through simulation with the application of a voting scheme. The simulation of the voting scheme using majority voting improved to the performance of the combine algorithm (security level 2) to FRR of 22% and FAR of 0%, the Active algorithm (security level 2) to FRR of 14.33% and FAR of 0% while the Non-active algorithm (security level 3) to FRR of 10.33% and FAR of 0%
    corecore