134 research outputs found

    An Empirical Study of AML Approach for Credit Card Fraud Detection—Financial Transactions

    Get PDF
    Credit card fraud is one of the flip sides of the digital world, where transactions are made without the knowledge of the genuine user. Based on the study of various papers published between 1994 and 2018 on credit card fraud, the following objectives are achieved: the various types of credit card frauds has identified and to detect automatically these frauds, an adaptive machine learning techniques (AMLTs) has studied and also their pros and cons has summarized. The various dataset are used in the literature has studied and categorized into the real and synthesized datasets.The performance matrices and evaluation criteria have summarized which has used to evaluate the fraud detection system.This study has also covered the deep analysis and comparison of the performance (i.e sensitivity, specificity, and accuracy) of existing machine learning techniques in the credit card fraud detection area.The findings of this study clearly show that supervised learning, card-not-present fraud, skimming fraud, and website cloning method has been used more frequently.This Study helps to new researchers by discussing the limitation of existing fraud detection techniques and providing helpful directions of research in the credit card fraud detection field

    Bank credit risk : evidence from Tunisia using Bayesian networks

    Get PDF
    In this article, a problem of measurement of credit risk in bank is studied. The approach suggested to solve it uses a Bayesian networks. After the data-gathering characterizing of the customers requiring of the loans, this approach consists initially with the samples collected, then the setting in works about it of various network architectures and combinations of functions of activation and training and comparison between the results got and the results of the current methods used. To address this problem we will try to create a graph that will be used to develop our credit scoring using Bayesian networks as a method. After, we will bring out the variables that affect the credit worthiness of the beneficiaries of credit. Therefore this article will be divided so the first part is the theoretical side of the key variables that affect the rate of reimbursement and the second part a description of the variables, the research methodology and the main results. The findings of this paper serve to provide an effective decision support system for banks to detect and alleviate the rate of bad borrowers through the use of a Bayesian Network model. This paper contributes to the existing literature on customers’ default payment and risk associated to allocating loans.peer-reviewe

    Predicting Material Weaknesses In Internal Control Systems After The Sarbanes-Oxley Act Using Multiple Criteria Linear Programming And Other Data Mining Approaches

    Get PDF
    Our study proposes a multiple criteria linear programming (MCLP) and other data mining methods to predict material weaknesses in a firm’s internal control system after the Sarbanes-Oxley Act (SOX) using 2003-2004 U.S. data.  The results of the MCLP and other data mining approaches in our prediction study show that the MCLP method performs better overall than the other data mining approaches using financial and other data from the Form 10-K report.  Consistent with prior research, firms that disclosed material weaknesses in their SOX Section 302 disclosures were more complex (based on the existence of foreign currency translations), more often used Big 4 auditors, and had lower operating cash flows-to-total assets ratios than the non-material weakness control firms.  Because of mixed results on several profitability measures and marginal predictive ability for the MCLP and other methods used, more research is needed to identify firm characteristics that help investors, auditors, and others predict material weaknesses

    Parallel Regularized Multiple-criteria Linear Programming

    Get PDF
    In this paper, we proposed a new parallel algorithm: Parallel Regularized Multiple-Criteria Linear Programming (PRMCLP) to overcome the computing and storage requirements increased rapidly with the number of training samples. Firstly, we convert RMCLP model into a unconstrained optimization problem, and then split it into several parts, and each part is computed by a single processor. After that, we analyze each part's result for next cycle going. By doing this, we are be able to obtain the final optimization solution of the whole classification problem. All experiments in public datasets show that our method greatly increases the training speed of RMCLP in the help of multiple processors.This work has been partially supported by China Postdoctoral Science Foundation under Grant No.2013M530702, and grants from National Natural Science Foundation of China(NO.11271361), key project of National Natural Science Foundation of China(NO.71331005), Major International (Regional) Joint Research Project(NO.71110107026), and the Ministry of water resources’ special funds for scientific research on public causes (No. 201301094).Peer ReviewedPostprint (published version

    Machine Learning Techniques for Credit Card Fraud Detection

    Get PDF
    The term “fraud”, it always concerned about credit card fraud in our minds. And after the significant increase in the transactions of credit card, the fraud of credit card increased extremely in last years. So the fraud detection should include surveillance of the spending attitude for the person/customer to the determination, avoidance, and detection of unwanted behavior. Because the credit card is the most payment predominant way for the online and regular purchasing, the credit card fraud raises highly. The Fraud detection is not only concerned with capturing of the fraudulent practices, but also, discover it as fast as they can, because the fraud costs millions of dollar business loss and it is rising over time, and that affects greatly the worldwide economy. . In this paper we introduce 14 different techniques of how data mining techniques can be successfully combined to obtain a high fraud coverage with a high or low false rate, the Advantage and The Disadvantages of every technique, and The Data Sets used in the researches by researcher

    MCDM approach to evaluating bank loan default models

    Get PDF
    Banks and financial institutions rely on loan default prediction models in credit risk management. An important yet challenging task in developing and applying default classification models is model evaluation and selection. This study proposes an evaluation approach for bank loan default classification models based on multiple criteria decision making (MCDM) methods. A large real-life Chinese bank loan dataset is used to validate the proposed approach. Specifically, a set of performance metrics is utilized to measure a selection of statistical and machine-learning default models. The technique for order preference by similarity to ideal solution (TOPSIS), a MCDM method, takes the performances of default classification models on multiple performance metrics as inputs to generate a ranking of default risk models. In addition, feature selection and sampling techniques are applied to the data pre-processing step to handle high dimensionality and class unbalancedness of bank loan default data. The results show that K-Nearest Neighbor algorithm has a good potential in bank loan default prediction

    Predictive Modelling of Retail Banking Transactions for Credit Scoring, Cross-Selling and Payment Pattern Discovery

    Get PDF
    Evaluating transactional payment behaviour offers a competitive advantage in the modern payment ecosystem, not only for confirming the presence of good credit applicants or unlocking the cross-selling potential between the respective product and service portfolios of financial institutions, but also to rule out bad credit applicants precisely in transactional payments streams. In a diagnostic test for analysing the payment behaviour, I have used a hybrid approach comprising a combination of supervised and unsupervised learning algorithms to discover behavioural patterns. Supervised learning algorithms can compute a range of credit scores and cross-sell candidates, although the applied methods only discover limited behavioural patterns across the payment streams. Moreover, the performance of the applied supervised learning algorithms varies across the different data models and their optimisation is inversely related to the pre-processed dataset. Subsequently, the research experiments conducted suggest that the Two-Class Decision Forest is an effective algorithm to determine both the cross-sell candidates and creditworthiness of their customers. In addition, a deep-learning model using neural network has been considered with a meaningful interpretation of future payment behaviour through categorised payment transactions, in particular by providing additional deep insights through graph-based visualisations. However, the research shows that unsupervised learning algorithms play a central role in evaluating the transactional payment behaviour of customers to discover associations using market basket analysis based on previous payment transactions, finding the frequent transactions categories, and developing interesting rules when each transaction category is performed on the same payment stream. Current research also reveals that the transactional payment behaviour analysis is multifaceted in the financial industry for assessing the diagnostic ability of promotion candidates and classifying bad credit applicants from among the entire customer base. The developed predictive models can also be commonly used to estimate the credit risk of any credit applicant based on his/her transactional payment behaviour profile, combined with deep insights from the categorised payment transactions analysis. The research study provides a full review of the performance characteristic results from different developed data models. Thus, the demonstrated data science approach is a possible proof of how machine learning models can be turned into cost-sensitive data models

    Analyse de grappe des données de catégories et de séquences étude et application à la prédiction de la faillite personnelle

    Get PDF
    Cluster analysis is one of the most important and useful data mining techniques, and there are many applications of cluster analysis in pattern extraction, information retrieval, summarization, compression and other areas. The focus of this thesis is on clustering categorical and sequence data. Clustering categorical and sequence data is much more challenging than clustering numeric data because there is no inherently meaningful measure of similarity between the categorical objects and sequences. In this thesis, we design novel efficient and effective clustering algorithms for clustering categorical data and sequence respectively, and we perform extensive experiments to demonstrate the superior performance of our proposed algorithm. We also explore the extent to which the use of the proposed clustering algorithms can help to solve the personal bankruptcy prediction problem. Clustering categorical data poses two challenges: defining an inherently meaningful similarity measure, and effectively dealing with clusters which are often embedded in different subspaces. In this thesis, we view the task of clustering categorical data from an optimization perspective and propose a novel objective function. Based on the new formulation, we design a divisive hierarchical clustering algorithm for categorical data, named DHCC. In the bisection procedure of DHCC, the initialization of the splitting is based on multiple correspondence analysis (MCA). We devise a strategy for dealing with the key issue in the divisive approach, namely, when to terminate the splitting process. The proposed algorithm is parameter-free, independent of the order in which the data is processed, scalable to large data sets and capable of seamlessly discovering clusters embedded in subspaces. The prior knowledge about the data can be incorporated into the clustering process, which is known as semi-supervised clustering, to produce considerable improvement in learning accuracy. In this thesis, we view semi-supervised clustering of categorical data as an optimization problem with extra instance-level constraints, and propose a systematic and fully automated approach to guide the optimization process to a better solution in terms of satisfying the constraints, which would also be beneficial to the unconstrained objects. The proposed semi-supervised divisive hierarchical clustering algorithm for categorical data, named SDHCC, is parameter-free, fully automatic and effective in taking advantage of instance-level constraint background knowledge to improve the quality of the resultant dendrogram. Many existing sequence clustering algorithms rely on a pair-wise measure of similarity between sequences. Usually, such a measure is effective if there are significantly informative patterns in the sequences. However, it is difficult to define a meaningful pair-wise similarity measure if sequences are short and contain noise. In this thesis, we circumvent the obstacle of defining the pairwise similarity by defining the similarity between an individual sequence and a set of sequences. Based on the new similarity measure, which is based on the conditional probability distribution (CPD) model, we design a novel model-based K -means clustering algorithm for sequence clustering, which works in a similar way to the traditional K -means on vectorial data. Finally, we develop a personal bankruptcy prediction system whose predictors are mainly the bankruptcy features discovered by the clustering techniques proposed in this thesis. The mined bankruptcy features are represented in low-dimensional vector space. From the new feature space, which can be extended with some existing prediction-capable features (e.g., credit score), a support vector machine (SVM) classifier is built to combine these mined and already existing features. Our system is readily comprehensible and demonstrates promising prediction performance
    • …
    corecore