Search CORE

7 research outputs found

Fine-Tuning a -Nearest Neighbors Machine Learning Model for the Detection of Insurance Fraud

Author: Stout Alliyah
Publication venue: The Aquila Digital Community
Publication date: 01/06/2022
Field of study

Billions of dollars are lost within insurance companies due to fraud. Large money losses force insurance companies to increase premium costs and/or restrict policies. This negatively affects a company’s loyal customers. Although this is a prevalent problem, companies are not urgently working toward bettering their machine learning algorithms. Underskilled workers paired with inefficient computer algorithms make it difficult to accurately and reliably detect fraud. The goal of this study is to understand the idea of -Nearest Neighbors ( -NN) and to use this classification technique to accurately detect fraudulent auto insurance claims. Using -NN requires choosing a value and a distance metric. The best choice of values and distance metrics will be unique to every dataset. This study aims to break down the processes involved in determining an accurate value and distance metric for a sample auto insurance claims dataset. Odd values 1 through 19 and the Euclidean, Manhattan, Chebyshev, and Hassanat metrics are analyzed using Excel and R. Results support the idea that unique values and distance metrics are needed depending on the dataset being worked with. Keywords: machine learning, insurance, fraud, detection, k-NN, distanc

Aquila Digital Community (University of Southern Mississippi, USM)

A Calibrated Data-Driven Approach for Small Area Estimation using Big Data

Author: Sharmeen Shaila
Tam Siu-Ming
Publication venue
Publication date: 05/06/2023
Field of study

Where the response variable in a big data set is consistent with the variable of interest for small area estimation, the big data by itself can provide the estimates for small areas. These estimates are often subject to the coverage and measurement error bias inherited from the big data. However, if a probability survey of the same variable of interest is available, the survey data can be used as a training data set to develop an algorithm to impute for the data missed by the big data and adjust for measurement errors. In this paper, we outline a methodology for such imputations based on an kNN algorithm calibrated to an asymptotically design-unbiased estimate of the national total and illustrate the use of a training data set to estimate the imputation bias and the fixed - asymptotic bootstrap to estimate the variance of the small area hybrid estimator. We illustrate the methodology of this paper using a public use data set and use it to compare the accuracy and precision of our hybrid estimator with the Fay-Harriot (FH) estimator. Finally, we also examine numerically the accuracy and precision of the FH estimator when the auxiliary variables used in the linking models are subject to under-coverage errorsComment: 26 pages, 2 figures, 2 tables and 2 appendice

arXiv.org e-Print Archive

On Identifying Terrorists Using Their Victory Signs

Author: Ahmad B. A. Hassanat
Publication venue: 'Ubiquity Press, Ltd.'
Publication date: 01/10/2018
Field of study

In certain cases, the only evidence to identify terrorists, who are seen in digital images or videos is their hands’ shapes, particularly, the victory sign as performed by many of them when they intentionally hide their faces, and/or distort their voices. This paper proposes new methods to identify those persons for the first time from their victory sign. These methods are based on features extracted from the fingers areas using shape moments in addition to other features related to fingers contours. To evaluate the proposed methods and to show the feasibility of this study we have created a victory sign database for 400 volunteers using a mobile phone camera. The experimental results using different classifiers show encouraging identification results; as the best precision/recall were achieved by merging normalized features from both methods using linear discriminate analysis classifier with 96.6% precision and 96.3 recall. Such a high performance achieved by the proposed methods shows their great potential to be applied for terrorists’ identification from their victory sign

Directory of Open Access Journals

SMOTEFUNA: Synthetic Minority Over-Sampling Technique Based on Furthest Neighbour Algorithm

Author: Almohammadi Khalid
Bellinger Colin
Csetverikov Dmitrij
Hassanat Ahmad B. A.
Tarawneh Ahmad S.
Publication venue
Publication date: 01/01/2020
Field of study

SZTAKI Publication Repository

Recommended from our members

Exploring a Generalizable Machine Learned Solution for Early Prediction of Student At-Risk Status

Author: Coleman Chad
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2021
Field of study

Determining which students are at-risk of poorer outcomes -- such as dropping out, failing classes, or decreasing standardized examination scores -- has become an important area of both research and practice in K-12 education. The models produced from this type of predictive modeling research are increasingly used by high schools in Early Warning Systems to identify which students are at risk and intervene to support better outcomes. It has become common practice to re-build and validate these detectors, district-by-district, due to different data semantics and various risk factors for students in different districts. As these detectors become more widely used, however, a new challenge emerges in applying these detectors across a broad spectrum of school districts with varying availability of past student data. Some districts have insufficient high-quality past data for building an effective detector. Novel approaches that can address the complex data challenges a new district presents are critical for advancing the field. Using an ensemble-based algorithm, I develop a modeling approach that can generate a useful model for a previously unseen district. During the ensembling process, my approach, District Similarity Ensemble Extrapolation (DSEE), weights districts that are more similar to the Target district more strongly during ensembling than less similar districts. Using this approach, I can predict student-at-risk status effectively for unseen districts, across a range of grade ranges, and achieve prediction goodness but ultimately fails to perform better than the previously published Knowles (2015) and Bowers (2012) EWS models proposed for use across districts

Columbia University Academic Commons