355 research outputs found

    A case study of applying boosting naive bayes to claim fraud diagnosis

    Full text link

    Bagging and boosting classification trees to predict churn.

    Get PDF
    In this paper, bagging and boosting techniques are proposed as performing tools for churn prediction. These methods consist of sequentially applying a classification algorithm to resampled or reweigthed versions of the data set. We apply these algorithms on a customer database of an anonymous U.S. wireless telecom company. Bagging is easy to put in practice and, as well as boosting, leads to a significant increase of the classification performance when applied to the customer database. Furthermore, we compare bagged and boosted classifiers computed, respectively, from a balanced versus a proportional sample to predict a rare event (here, churn), and propose a simple correction method for classifiers constructed from balanced training samples.Algorithms; Bagging; Boosting; Churn; Classification; Classifiers; Companies; Data; Gini coefficient; Methods; Performance; Rare events; Sampling; Top decile; Training;

    A Comprehensive Survey of Data Mining-based Fraud Detection Research

    Full text link
    This survey paper categorises, compares, and summarises from almost all published technical and review articles in automated fraud detection within the last 10 years. It defines the professional fraudster, formalises the main types and subtypes of known fraud, and presents the nature of data evidence collected within affected industries. Within the business context of mining the data to achieve higher cost savings, this research presents methods and techniques together with their problems. Compared to all related reviews on fraud detection, this survey covers much more technical articles and is the only one, to the best of our knowledge, which proposes alternative data and solutions from related domains.Comment: 14 page

    Machine Learning Methods for the Detection of Fraudulent Insurance Claims

    Get PDF
    This thesis focuses on automotive fraudulent claims detection, a particular Property and Casualty (P&C) insurance product. By analyzing the customer's information, we try to define a model to determine if one customer has filed a fraudulent claim. Two datasets used in this thesis. One of them is very imbalanced, as 6.1% of policyholders file fraudulent claims (coded as 1) and 93.9% of policyholders file normal claims (coded as 0). So, we need to deal with the imbalanced classes, by using rebalanced methods such as SMOTE and under-sampling. Then we use classical methods (naĂŻve Bayes and logistic regression) and new data science methods (random forest and gradient boosting) to detect the fraudulent claims. During the process, we compare these methods to find which one performs better for this application. In addition, the combination of SMOTE and clustering is also used to these two datasets, which is unusual in fraud detection. But the results have been improved a lot for all these four classification models. What is more, link analysis method has also been mentioned in the conclusion. These methods have also been used to another dataset, which is not that imbalanced, with 24.7% of fraudulent claims and 75.3% of normal claims. The reason for using two datasets is to see if the degree of imbalance affects the performance of the oversampling, undersampling and different models. If so, then these methodologies will be more convincing. If not, we can dig deeper to find the reason

    Cost-Sensitive Selective Classification and its Applications to Online Fraud Management

    Get PDF
    abstract: Fraud is defined as the utilization of deception for illegal gain by hiding the true nature of the activity. While organizations lose around $3.7 trillion in revenue due to financial crimes and fraud worldwide, they can affect all levels of society significantly. In this dissertation, I focus on credit card fraud in online transactions. Every online transaction comes with a fraud risk and it is the merchant's liability to detect and stop fraudulent transactions. Merchants utilize various mechanisms to prevent and manage fraud such as automated fraud detection systems and manual transaction reviews by expert fraud analysts. Many proposed solutions mostly focus on fraud detection accuracy and ignore financial considerations. Also, the highly effective manual review process is overlooked. First, I propose Profit Optimizing Neural Risk Manager (PONRM), a selective classifier that (a) constitutes optimal collaboration between machine learning models and human expertise under industrial constraints, (b) is cost and profit sensitive. I suggest directions on how to characterize fraudulent behavior and assess the risk of a transaction. I show that my framework outperforms cost-sensitive and cost-insensitive baselines on three real-world merchant datasets. While PONRM is able to work with many supervised learners and obtain convincing results, utilizing probability outputs directly from the trained model itself can pose problems, especially in deep learning as softmax output is not a true uncertainty measure. This phenomenon, and the wide and rapid adoption of deep learning by practitioners brought unintended consequences in many situations such as in the infamous case of Google Photos' racist image recognition algorithm; thus, necessitated the utilization of the quantified uncertainty for each prediction. There have been recent efforts towards quantifying uncertainty in conventional deep learning methods (e.g., dropout as Bayesian approximation); however, their optimal use in decision making is often overlooked and understudied. Thus, I present a mixed-integer programming framework for selective classification called MIPSC, that investigates and combines model uncertainty and predictive mean to identify optimal classification and rejection regions. I also extend this framework to cost-sensitive settings (MIPCSC) and focus on the critical real-world problem, online fraud management and show that my approach outperforms industry standard methods significantly for online fraud management in real-world settings.Dissertation/ThesisDoctoral Dissertation Computer Science 201

    Bagging and boosting classification trees to predict churn.

    Get PDF
    Bagging; Boosting; Classification; Churn;

    A comparative analysis of decision trees vis-a-vis other computational data mining techniques in automotive insurance fraud detection

    Get PDF
    The development and application of computational data mining techniques in financial fraud detection and business failure prediction has become a popular cross-disciplinary research area in recent times involving financial economists, forensic accountants and computational modellers. Some of the computational techniques popularly used in the context of - financial fraud detection and business failure prediction can also be effectively applied in the detection of fraudulent insurance claims and therefore, can be of immense practical value to the insurance industry. We provide a comparative analysis of prediction performance of a battery of data mining techniques using real-life automotive insurance fraud data. While the data we have used in our paper is US-based, the computational techniques we have tested can be adapted and generally applied to detect similar insurance frauds in other countries as well where an organized automotive insurance industry exists
    • …
    corecore