7 research outputs found
Fine-Tuning a -Nearest Neighbors Machine Learning Model for the Detection of Insurance Fraud
Billions of dollars are lost within insurance companies due to fraud. Large money losses force insurance companies to increase premium costs and/or restrict policies. This negatively affects a company’s loyal customers. Although this is a prevalent problem, companies are not urgently working toward bettering their machine learning algorithms. Underskilled workers paired with inefficient computer algorithms make it difficult to accurately and reliably detect fraud.
The goal of this study is to understand the idea of -Nearest Neighbors ( -NN) and to use this classification technique to accurately detect fraudulent auto insurance claims. Using -NN requires choosing a value and a distance metric. The best choice of values and distance metrics will be unique to every dataset. This study aims to break down the processes involved in determining an accurate value and distance metric for a sample auto insurance claims dataset. Odd values 1 through 19 and the Euclidean, Manhattan, Chebyshev, and Hassanat metrics are analyzed using Excel and R.
Results support the idea that unique values and distance metrics are needed depending on the dataset being worked with.
Keywords: machine learning, insurance, fraud, detection, k-NN, distanc
A Calibrated Data-Driven Approach for Small Area Estimation using Big Data
Where the response variable in a big data set is consistent with the variable
of interest for small area estimation, the big data by itself can provide the
estimates for small areas. These estimates are often subject to the coverage
and measurement error bias inherited from the big data. However, if a
probability survey of the same variable of interest is available, the survey
data can be used as a training data set to develop an algorithm to impute for
the data missed by the big data and adjust for measurement errors. In this
paper, we outline a methodology for such imputations based on an kNN algorithm
calibrated to an asymptotically design-unbiased estimate of the national total
and illustrate the use of a training data set to estimate the imputation bias
and the fixed - asymptotic bootstrap to estimate the variance of the small area
hybrid estimator. We illustrate the methodology of this paper using a public
use data set and use it to compare the accuracy and precision of our hybrid
estimator with the Fay-Harriot (FH) estimator. Finally, we also examine
numerically the accuracy and precision of the FH estimator when the auxiliary
variables used in the linking models are subject to under-coverage errorsComment: 26 pages, 2 figures, 2 tables and 2 appendice
On Identifying Terrorists Using Their Victory Signs
In certain cases, the only evidence to identify terrorists, who are seen in digital images or videos is their hands’ shapes, particularly, the victory sign as performed by many of them when they intentionally hide their faces, and/or distort their voices. This paper proposes new methods to identify those persons for the first time from their victory sign. These methods are based on features extracted from the fingers areas using shape moments in addition to other features related to fingers contours. To evaluate the proposed methods and to show the feasibility of this study we have created a victory sign database for 400 volunteers using a mobile phone camera. The experimental results using different classifiers show encouraging identification results; as the best precision/recall were achieved by merging normalized features from both methods using linear discriminate analysis classifier with 96.6% precision and 96.3 recall. Such a high performance achieved by the proposed methods shows their great potential to be applied for terrorists’ identification from their victory sign
Recommended from our members
Exploring a Generalizable Machine Learned Solution for Early Prediction of Student At-Risk Status
Determining which students are at-risk of poorer outcomes -- such as dropping out, failing classes, or decreasing standardized examination scores -- has become an important area of both research and practice in K-12 education. The models produced from this type of predictive modeling research are increasingly used by high schools in Early Warning Systems to identify which students are at risk and intervene to support better outcomes. It has become common practice to re-build and validate these detectors, district-by-district, due to different data semantics and various risk factors for students in different districts. As these detectors become more widely used, however, a new challenge emerges in applying these detectors across a broad spectrum of school districts with varying availability of past student data. Some districts have insufficient high-quality past data for building an effective detector. Novel approaches that can address the complex data challenges a new district presents are critical for advancing the field.
Using an ensemble-based algorithm, I develop a modeling approach that can generate a useful model for a previously unseen district. During the ensembling process, my approach, District Similarity Ensemble Extrapolation (DSEE), weights districts that are more similar to the Target district more strongly during ensembling than less similar districts. Using this approach, I can predict student-at-risk status effectively for unseen districts, across a range of grade ranges, and achieve prediction goodness but ultimately fails to perform better than the previously published Knowles (2015) and Bowers (2012) EWS models proposed for use across districts