Search CORE

355 research outputs found

A case study of applying boosting naive bayes to claim fraud diagnosis

Author: G. Dedene
R.A. Derrig
S. Viaene
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Bagging and boosting classification trees to predict churn.

Author: Croux Christophe
Lemmens Aurélie
Publication venue
Publication date
Field of study

In this paper, bagging and boosting techniques are proposed as performing tools for churn prediction. These methods consist of sequentially applying a classification algorithm to resampled or reweigthed versions of the data set. We apply these algorithms on a customer database of an anonymous U.S. wireless telecom company. Bagging is easy to put in practice and, as well as boosting, leads to a significant increase of the classification performance when applied to the customer database. Furthermore, we compare bagged and boosted classifiers computed, respectively, from a balanced versus a proportional sample to predict a rare event (here, churn), and propose a simple correction method for classifiers constructed from balanced training samples.Algorithms; Bagging; Boosting; Churn; Classification; Classifiers; Companies; Data; Gini coefficient; Methods; Performance; Rare events; Sampling; Top decile; Training;

Research Papers in Economics

A Comprehensive Survey of Data Mining-based Fraud Detection Research

Author: Agrawal
Au
Berry
Brentnall
Chen
Chiang Wang
David C. Yen
Feelders
Han
Hayhoe
Kirkosa
Ku
Leonard
Mitchell
Ngai
Quah
Rothman
Shaw
Shing-Han Li
Song
Sudjianto
Titus
Wen-Hui Lu
White
Publication venue: 'Elsevier BV'
Publication date: 30/09/2010
Field of study

This survey paper categorises, compares, and summarises from almost all published technical and review articles in automated fraud detection within the last 10 years. It defines the professional fraudster, formalises the main types and subtypes of known fraud, and presents the nature of data evidence collected within affected industries. Within the business context of mining the data to achieve higher cost savings, this research presents methods and techniques together with their problems. Compared to all related reviews on fraud detection, this survey covers much more technical articles and is the only one, to the best of our knowledge, which proposes alternative data and solutions from related domains.Comment: 14 page

arXiv.org e-Print Archive

Crossref

Machine Learning Methods for the Detection of Fraudulent Insurance Claims

Author: Zhao Sisheng
Publication venue
Publication date: 20/03/2020
Field of study

This thesis focuses on automotive fraudulent claims detection, a particular Property and Casualty (P&C) insurance product. By analyzing the customer's information, we try to define a model to determine if one customer has filed a fraudulent claim. Two datasets used in this thesis. One of them is very imbalanced, as 6.1% of policyholders file fraudulent claims (coded as 1) and 93.9% of policyholders file normal claims (coded as 0). So, we need to deal with the imbalanced classes, by using rebalanced methods such as SMOTE and under-sampling. Then we use classical methods (naïve Bayes and logistic regression) and new data science methods (random forest and gradient boosting) to detect the fraudulent claims. During the process, we compare these methods to find which one performs better for this application. In addition, the combination of SMOTE and clustering is also used to these two datasets, which is unusual in fraud detection. But the results have been improved a lot for all these four classification models. What is more, link analysis method has also been mentioned in the conclusion. These methods have also been used to another dataset, which is not that imbalanced, with 24.7% of fraudulent claims and 75.3% of normal claims. The reason for using two datasets is to see if the degree of imbalance affects the performance of the oversampling, undersampling and different models. If so, then these methodologies will be more convincing. If not, we can dig deeper to find the reason

Concordia University Research Repository

Cost-Sensitive Selective Classification and its Applications to Online Fraud Management

Author
Publication venue
Publication date: 01/01/2019
Field of study

abstract: Fraud is defined as the utilization of deception for illegal gain by hiding the true nature of the activity. While organizations lose around $3.7 trillion in revenue due to financial crimes and fraud worldwide, they can affect all levels of society significantly. In this dissertation, I focus on credit card fraud in online transactions. Every online transaction comes with a fraud risk and it is the merchant's liability to detect and stop fraudulent transactions. Merchants utilize various mechanisms to prevent and manage fraud such as automated fraud detection systems and manual transaction reviews by expert fraud analysts. Many proposed solutions mostly focus on fraud detection accuracy and ignore financial considerations. Also, the highly effective manual review process is overlooked. First, I propose Profit Optimizing Neural Risk Manager (PONRM), a selective classifier that (a) constitutes optimal collaboration between machine learning models and human expertise under industrial constraints, (b) is cost and profit sensitive. I suggest directions on how to characterize fraudulent behavior and assess the risk of a transaction. I show that my framework outperforms cost-sensitive and cost-insensitive baselines on three real-world merchant datasets. While PONRM is able to work with many supervised learners and obtain convincing results, utilizing probability outputs directly from the trained model itself can pose problems, especially in deep learning as softmax output is not a true uncertainty measure. This phenomenon, and the wide and rapid adoption of deep learning by practitioners brought unintended consequences in many situations such as in the infamous case of Google Photos' racist image recognition algorithm; thus, necessitated the utilization of the quantified uncertainty for each prediction. There have been recent efforts towards quantifying uncertainty in conventional deep learning methods (e.g., dropout as Bayesian approximation); however, their optimal use in decision making is often overlooked and understudied. Thus, I present a mixed-integer programming framework for selective classification called MIPSC, that investigates and combines model uncertainty and predictive mean to identify optimal classification and rejection regions. I also extend this framework to cost-sensitive settings (MIPCSC) and focus on the critical real-world problem, online fraud management and show that my approach outperforms industry standard methods significantly for online fraud management in real-world settings.Dissertation/ThesisDoctoral Dissertation Computer Science 201

ASU Digital Repository

Bagging and boosting classification trees to predict churn.

Author: Croux Christophe
Lemmens Aurélie
Publication venue
Publication date
Field of study

Bagging; Boosting; Classification; Churn;

Research Papers in Economics

A comparative analysis of decision trees vis-a-vis other computational data mining techniques in automotive insurance fraud detection

Author: Bhattacharya Sukanto
Gepp Adrian
Kumar Kuldeep
Wilson J. Holton
Publication venue
Publication date: 01/07/2012
Field of study

The development and application of computational data mining techniques in financial fraud detection and business failure prediction has become a popular cross-disciplinary research area in recent times involving financial economists, forensic accountants and computational modellers. Some of the computational techniques popularly used in the context of - financial fraud detection and business failure prediction can also be effectively applied in the detection of fraudulent insurance claims and therefore, can be of immense practical value to the insurance industry. We provide a comparative analysis of prediction performance of a battery of data mining techniques using real-life automotive insurance fraud data. While the data we have used in our paper is US-based, the computational techniques we have tested can be adapted and generally applied to detect similar insurance frauds in other countries as well where an organized automotive insurance industry exists

Bond University Research Portal

Deakin Research Online

A machine learning approach for predicting Antibody Properties

Author: Alberts B.
Cai L.
Chai X.
Cutler A.
Divya K.
Durmuş S.
Fletcher T.
Gonzalez M. W.
Hamanaka M.
Jaafar H.
Kösesoy D.
Maitra S.
Pérez-Ortiz M.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 28/06/2020
Field of study

Crossref

University of South Wales Research Explorer