6 research outputs found

    Comparing the performance of oversampling techniques for imbalanced learning in insurance fraud detection

    Get PDF
    Dissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced AnalyticsAlthough the current trend of data production is focused on generating tons of it every second, there are situations where the target category is represented extremely unequally, giving rise to imbalanced datasets, analyzing them correctly can lead to relevant decisions that produces appropriate business strategies. Fraud modeling is one example of this situation: it is expected less fraudulent transactions than reliable ones, predict them could be crucial for improving decisions and processes in a company. However, class imbalance produces a negative effect on traditional techniques in dealing with this problem, a lot of techniques have been proposed and oversampling is one of them. This work analyses the behavior of different oversampling techniques such as Random oversampling, SOMO and SMOTE, through different classifiers and evaluation metrics. The exercise is done with real data from an insurance company in Colombia predicting fraudulent claims for its compulsory auto product. Conclusions of this research demonstrate the advantages of using oversampling for imbalance circumstances but also the importance of comparing different evaluation metrics and classifiers to obtain accurate appropriate conclusions and comparable results

    FIGHTING INSURANCE FRAUD USING TECHNOLOGY

    Get PDF
    The risk of insurance fraud, or fraud in general, has been analyzed for a very long time, but at the current moment it has become more intense. Insurance fraud has many aspects related to traditional claims (for example: death, disability, income protection and hospital claims). However, there are other areas of concern that increase the potential for parties to commit fraud. The impact of COVID-19 on global business is undeniable. Unfortunately, however, as accounts receivable grow, so does the risk of fraud. Using technology-based solutions identifies red flags of fraud and minimizes disruption to the claims process to ensure customer response times are top notch and legitimate claims are not unnecessarily delayed. COVID-19 has been a factor for remote work in most fields and companies, while increasing the use of technology. Determining risks and responding to them must take place through consistent and effective processes. This can be enhanced by the correct use of technology. Combining technology with effective resources and experienced staff is a priority step to maximize robustness in deterring, preventing, detecting and responding to suspected fraud incidents. If financial losses associated with fraud are one of the consequences, then investing in technology and skilled and experienced resources is an imperative. Brand damage and loss of investor confidence pose some critical and additional considerations when assessing fraud risk and a firm's response to fraud

    Implementasi SMOTE dan Under Sampling pada Imbalanced Dataset untuk Prediksi Kebangkrutan Perusahaan

    Get PDF
    Company bankruptcy becomes a serious problem because it can cause economic damage and other social consequences. It’s very important to predict bankruptcy as early as possible because prediction can be useful for evaluation and planning to avoid bankruptcy. Bankruptcy prediction is one of the imbalanced classification problems because the data with the bankrupt class is far less than the non-bankrupt class. This study aims to produce a good classification model for predicting bankruptcy. Resampling used a combination of SMOTE and under sampling, is applied to the training data to produce more optimal classification model. The classification method used for prediction is multilayer perceptron and complement naïve bayes. Predictive performance was calculated using recall, ROC AUC, and PR AUC. Based on the test, using SMOTE and under sampling is quite significant in improving the classification model on the multilayer perceptron. Resampling in complement naïve bayes also increased. recall and PR AUC scores The best recall obtained was 95.45% with the complement naïve bayes method. The highest ROC AUC with resampling was also obtained using complement naïve bayes of 87.80%. Therefore, it’s concluded that bankruptcy prediction using resampling with SMOTE and under sampling, can produce good performance for detecting bankruptcy.Kebangkrutan pada suatu perusahaan menjadi masalah yang serius karena dapat menyebabkan kerusakan ekonomi serta konsekuensi sosial lainnya. Sangat penting untuk melakukan prediksi kebangkrutan sedini mungkin karena prediksi ini dapat bermanfaat untuk evaluasi serta merencanakan tindakan pencegahan dalam menghindari kebangkrutan. Prediksi kebangkrutan merupakan salah satu permasalahan imbalanced classification karena data dengan kelas bangkrut jauh lebih sedikit daripada kelas tidak bangkrut. Penelitian ini bertujuan untuk menghasilkan model klasifikasi yang baik untuk melakukan prediksi kebangkrutan. Resampling diterapkan pada data latih agar menghasilkan model klasifikasi yang lebih optimal. Metode resampling yang digunakan adalah kombinasi SMOTE dan under sampling. Metode klasifikasi yang digunakan untuk prediksi adalah multilayer perceptron dan complement naïve bayes. Performa prediksi dihitung menggunakan skor recall, ROC AUC, dan PR AUC. Berdasarkan hasil pengujian, penggunaan SMOTE dan under sampling cukup signifikan dalam memperbaiki model klasifikasi pada multilayer perceptron. Pada prediksi menggunakan complement naïve bayes, nilai recall dan PR AUC juga meningkat. Recall terbaik yang diperoleh sebesar 95,45% dengan metode complement naïve bayes. Untuk ROC AUC tertinggi dengan resampling juga diperoleh menggunakan complement naïve bayes sebesar 87,80%. Oleh karena itu, disimpulkan bahwa prediksi kebangkrutan menggunakan teknik resampling yaitu SMOTE dan under sampling dapat menghasilkan performa baik untuk pendeteksian kelas bangkrut

    Using Feature Selection with Machine Learning for Generation of Insurance Insights

    Get PDF
    Insurance is a data-rich sector, hosting large volumes of customer data that is analysed to evaluate risk. Machine learning techniques are increasingly used in the effective management of insurance risk. Insurance datasets by their nature, however, are often of poor quality with noisy subsets of data (or features). Choosing the right features of data is a significant pre-processing step in the creation of machine learning models. The inclusion of irrelevant and redundant features has been demonstrated to affect the performance of learning models. In this article, we propose a framework for improving predictive machine learning techniques in the insurance sector via the selection of relevant features. The experimental results, based on five publicly available real insurance datasets, show the importance of applying feature selection for the removal of noisy features before performing machine learning techniques, to allow the algorithm to focus on influential features. An additional business benefit is the revelation of the most and least important features in the datasets. These insights can prove useful for decision making and strategy development in areas/business problems that are not limited to the direct target of the downstream algorithms. In our experiments, machine learning techniques based on a set of selected features suggested by feature selection algorithms outperformed the full feature set for a set of real insurance datasets. Specifically, 20% and 50% of features in our five datasets had improved downstream clustering and classification performance when compared to whole datasets. This indicates the potential for feature selection in the insurance sector to both improve model performance and to highlight influential features for business insights

    Prediction of Maternity Recovery Rate of Group Long-Term Disability Insurance Using XGBoost

    Get PDF
    To help insurers determine insurance rates incorporating maternity factors, it is crucial to understand the maternity recovery rate, which was a metric used by insurance companies to understand how much of the expenses associated with maternity care and related medical services are covered by their policies. This paper employed Extreme Gradient Boosting (XGBoost), a powerful method for handling complex data relationships and preventing overfitting, on North American Group Long-Term Disability dataset obtained from the Society of Actuaries, which listed maternity as one of its categories, to predict the maternity recovery rate. In comparison, other machine learning methods such as Gradient Boosting Machine (GBM) and Bayesian Additive Regression Tree (BART) were used, with Root Mean Squared Error (RMSE) values calculated the difference between predicted and observed maternity recovery rates. Four datasets, 3 imbalanced and 1 fairly-balanced, were created out of the original dataset to test each method’s predictive prowess. The study revealed that XGBoost performed exceptionally well on the imbalanced datasets, while BART showed slight superiority in fairly-balanced data. Furthermore, the model identified the duration, exposures, and age of participants in both predicting maternity recovery rates and the underwriting process

    The moderating effect of capability fraud element on fraud prevention in the Domicile/Local Saudi Arabian retail banking sector

    Get PDF
    The rapid increase in fraud nowadays has affected the economies of both developed and developing countries. Fraud has led to big losses, thus affecting many banks in the world including those in Saudi Arabia. Hence, this study investigates the moderating effect of capability fraud element on fraud prevention and industry factors in the Saudi Arabian banking sector. The framework of this study was developed from Fraud Diamond Theory and the knowledge management model. In this study, customers knowledge, internal control, insider involvement, information sharing, legal and regulation (independent variable) with the capability element of fraud are the factors used to explain or that affect the prevention of fraud. Proportional sampling technique was used to select the respondents for the study and questionnaires were distributed to the bank employees of 12 Saudi banks, resulting in 328 completed questionnaires, making up 77.3% response rate. Meanwhile, hypotheses tests were done by PLE-SEM 3.2.7 version. The result have confirms a significant positive relationship between fraud prevention and customers’ knowledge, information sharing, insider’s involvement, internal control, and legal & regulation. However, customer knowledge, information sharing, and insider’s involvement have no significant relationship to strengthen capability element of fraud as moderator to fraud prevention. The findings also confirm that capability element of fraud plays a significant role in moderating the relationship between the two factors namely, internal control, and legal & regulation and fraud prevention, while other variables were found to be insignificant to explain fraud prevention. Besides, there is important insights to managers, CEOs, policy makers, banking regulatory authorities, financial institutions, and researchers to consider the use of fraud prevention in improving the banking sector in Saudi Arabia
    corecore