16,275 research outputs found

    Mining Bad Credit Card Accounts from OLAP and OLTP

    Full text link
    Credit card companies classify accounts as a good or bad based on historical data where a bad account may default on payments in the near future. If an account is classified as a bad account, then further action can be taken to investigate the actual nature of the account and take preventive actions. In addition, marking an account as "good" when it is actually bad, could lead to loss of revenue - and marking an account as "bad" when it is actually good, could lead to loss of business. However, detecting bad credit card accounts in real time from Online Transaction Processing (OLTP) data is challenging due to the volume of data needed to be processed to compute the risk factor. We propose an approach which precomputes and maintains the risk probability of an account based on historical transactions data from offline data or data from a data warehouse. Furthermore, using the most recent OLTP transactional data, risk probability is calculated for the latest transaction and combined with the previously computed risk probability from the data warehouse. If accumulated risk probability crosses a predefined threshold, then the account is treated as a bad account and is flagged for manual verification.Comment: Conference proceedings of ICCDA, 201

    Role and Effects of Credit Information Sharing

    Get PDF
    Information sharing about borrowers’ characteristics and their indebtedness can have important effects on credit markets activity. First, it improves the banks’ knowledge of applicants’ characteristics and permits a more accurate prediction of their repayment probabilities. Second, it reduces the informational rents that banks could otherwise extract from their customers. Third, it can operate as a borrower discipline device. Finally, it eliminates borrowers’ incentive to become over-indebted by drawing credit simultaneously from many banks without any of them realizing. This chapter provides a brief account of models that capture these four effects of information sharing on credit market performance, as well as of the growing body of empirical studies that have attempted to investigate the various dimensions and effects of credit reporting activity. Understanding the effects of information sharing also helps to shed light on some key issues in the design of a credit information system, such as the relationship between public and private mechanisms, the dosage between black and white information sharing, and the “memory” of the system. Merging the insights from theoretical models with the lessons of experience, one can avoid serious pitfalls in the design of credit information systems.information sharing, credit markets

    Information Sharing in Credit Markets: A Survey

    Get PDF
    Information sharing about borrowers' characteristics and their indebtedness can have important effects on credit markets activity. First, it improves the banks' knowledge of applicants' characteristics and permits a more accurate prediction of their repayment probabilities. Second, it reduces the informational rents that banks could otherwise extract from their customers. Third, it can operate as a borrower discipline device. Finally, it eliminates borrowers' incentive to become over-indebted by drawing credit simultaneously from many banks without any of them realizing. Understanding the effects of information sharing also helps to shed light on some key issues in the design of a credit information system, such as the relationship between public and private mechanisms, the dosage between black and white information sharing, and the "memory" of the system. Merging the insights from theoretical models with the lessons of experience, one can avoid serious pitfalls in the design of credit information systems.information sharing, credit markets

    Ensemble of Example-Dependent Cost-Sensitive Decision Trees

    Get PDF
    Several real-world classification problems are example-dependent cost-sensitive in nature, where the costs due to misclassification vary between examples and not only within classes. However, standard classification methods do not take these costs into account, and assume a constant cost of misclassification errors. In previous works, some methods that take into account the financial costs into the training of different algorithms have been proposed, with the example-dependent cost-sensitive decision tree algorithm being the one that gives the highest savings. In this paper we propose a new framework of ensembles of example-dependent cost-sensitive decision-trees. The framework consists in creating different example-dependent cost-sensitive decision trees on random subsamples of the training set, and then combining them using three different combination approaches. Moreover, we propose two new cost-sensitive combination approaches; cost-sensitive weighted voting and cost-sensitive stacking, the latter being based on the cost-sensitive logistic regression method. Finally, using five different databases, from four real-world applications: credit card fraud detection, churn modeling, credit scoring and direct marketing, we evaluate the proposed method against state-of-the-art example-dependent cost-sensitive techniques, namely, cost-proportionate sampling, Bayes minimum risk and cost-sensitive decision trees. The results show that the proposed algorithms have better results for all databases, in the sense of higher savings.Comment: 13 pages, 6 figures, Submitted for possible publicatio

    Credit Scoring for Vietnam’s Retail Banking Market: Implementation and Implications for Transactional versus Relationship Lending

    Get PDF
    As banking markets in developing countries are maturing, banks face competition not only from other domestic banks but also from sophisticated foreign banks. Combined with a dramatic growth of consumer credit and increased regulatory attention to risk management, the development of a well-functioning credit assessment framework is essential. As part of such a framework, we propose a credit scoring model for Vietnamese retail loans. First, we show how to identify those borrower characteristics that should be part of a credit scoring model. Second, we illustrate how such a model can be calibrated to achieve the strategic objectives of the bank. Finally, we assess the use of credit scoring models in the context of transactional versus relationship lending.financial economics and financial management ;

    Identifying Optimal Parameters And Their Impact For Predicting Credit Card Defaulters Using Machine-Learning Algorithms

    Get PDF
    Data mining and Machine learning are the emerging technologies that are rapidly spreading in every field of life due to their beneficial aspects. The financial sector also makes use of these technologies. Many research studies regarding banking data analysis have been performed using machine learning techniques. These research studies also have many Problems as the main focus of these studies was to achieve high accuracy and some of them only perform comparative analysis of different classifier's performance. Another major drawback of these studies was that they do not identify any optimal parameters and their impact. In this research, we have identified optimal parameters. These parameters are valuable for performing the credit scoring process and might also be used to predict credit card defaulters. We also find their impact on the results. We have used feature selection and classification techniques to identify optimal parameters and their impact on credit card defaulters identification. We have introduced three classifiers which are Kstar, SMO and Multilayer perceptron and repeat the process of classification and feature selection for every classifier. First, we apply feature selection techniques to our dataset with each classifier to find out possible optimal parameters and In the next phase, we use classification to find the impact of possible optimal parameters and proved our findings. In each round of classification, we have used different parameters available in the dataset every time we include and exclude some parameters and noted the results of each run of classification with each classifier and in this way, we identify the optimal parameters and their impact on the results Whereas we also analyze the performance of classifiers. To perform this research study, we use the “credit card defaults” dataset which we obtained from UCI Machine learning online repository. We use two feature selection techniques that include ranker approach and evolutionary search method and after that, we also apply classification techniques on the dataset. This research can help to reduce the complexities of the credit scoring process. Through this study, we identify up to six optimal parameters and also find their impact on the performance of classifiers. Further We also identify that multilayer perceptron was the best performing classifier out of three. This research work can also be extended to other fields in the future where we use this mechanism to find out optimal parameters and their impact can help us to predict the  results.  &nbsp

    Using Machine Learning Techniques to Predict a Risk Score for New Members of a Chit Fund Group

    Get PDF
    Predicting the risk score of new and potential customers is used across the financial industry. By implementing the prediction of risk scores for their customers a chit fund company can improve the knowledge and customer understanding without relying on human knowledge. Data is collected on each customer before they have taken out credit and during the time they contribute to a chit fund. Having collected the necessary data, the company can then decide whether modelling customer risk would benefit them. As the data is available historically, one aspect of risk score prediction will be the focus of this thesis, supervised machine learning. Supervised machine learning techniques use historic data to ‘learn a model of the relationship between a set of descriptive features and a target feature’ (Kelleher, Mac Namee, & D’Arcy, 2015). There are many supervised machine learning techniques; support vector machine (SVM), logistic regression and decision trees will be the focal point of this thesis. The main objective of this project attempts to predict a risk score for new or potential subscribers of a chit fund company. The models generated would be suitable for use before a customer joins a chit fund group as well as while the customer is taking part in the group, measuring risk before becoming a subscriber and the behavioural risk while with the company. The objective is to extend research already carried out to predict a score from zero to one identifying the probability of default. Default, for the purpose of this project, is defined as being more than 90 days late with a payment. The data of real chit fund subscribers was used to train and test the models built for the project. A factor reduction technique was used to identify key variables, and multiple models were tested to determine which gives the best results. The second objective of this project will look at the subscriber network. This section of the project will check for links between subscribers, and investigate a possible link between subscribers and their chance of default. Variables such as address and nominee will be the focus in this section. iii The most successful supervised machine learning model was the random forest model with precision of 59% and recall of 92%. Accuracy for this model was the highest of each of the models in the experiment at 85%. However, this is not the most trustworthy evaluation measure for this project as the dataset is unbalanced. A combination of 300 decision trees were applied in this model. Using the classification method, the class that was predicted by the majority of trees was selected as the final prediction. This achieved high accuracy of the dataset from the chit fund company, Kyepot. Social network analysis found that there was no unusual relationship between subscribers that went into default with regards to the area in which they live or their nominees. Supervised machine learning techniques have been shown to be a useful tool in the financial industry. This project suggests that these techniques may also be useful tools for chit fund companies. This project evaluates four different techniques suggesting the random forest technique is the most useful for this chit fund company
    • 

    corecore