39,833 research outputs found

    Using data mining to predict automobile insurance fraud

    Get PDF
    This thesis presents a study on the issue of Automobile Insurance Fraud. The purpose of this study is to increase knowledge concerning fraudulent claims in the Portuguese market, while raising awareness to the use of Data Mining techniques towards this, and other similar problems. We conduct an application of data mining techniques to the problem of predicting automobile insurance fraud, shown to be of interest to insurance companies around the world. We present fraud definitions and conduct an overview of existing literature on the subject. Live policy and claim data from the Portuguese insurance market in 2005 is used to train a Logit Regression Model and a CHAID Classification and Regression Tree. The use of Data Mining tools and techniques enabled the identification of underlying fraud patterns, specific to the raw data used to build the models. The list of potential fraud indicators includes variables such as the policy’s tenure, the number of policy holders, not admitting fault in the accident or fractioning premium payments semiannually. Other variables such as the number of days between the accident and the patient filing the claim, the client’s age, and the geographical location of the accident were also found to be relevant in specific sub-populations of the used dataset. Model variables and coefficients are interpreted comparatively and key performance results are presented, including PCC, sensitivity, specificity and AUROC. Both the Logit Model and the CHAID C&R Tree achieve fair results in predicting automobile insurance fraud in the used dataset

    Empirical Evidence on the Use of Credit Scoring for Predicting Insurance Losses with Psycho-social and Biochemical Explanations

    Get PDF
    An important development in personal lines of insurance in the United States is the use of credit history data for insurance risk classification to predict losses. This research presents the results of collaboration with industry conducted by a university at the request of its state legislature. The purpose was to see the viability and validity of the use of credit scoring to predict insurance losses given its controversial nature and criticism as redundant of other predictive variables currently used. Working with industry and government, this study analyzed more than 175,000 policyholders’ information for the relationship between credit score and claims. Credit scores were significantly related to incurred losses, evidencing both statistical and practical significance. We investigate whether the revealed relationship between credit score and incurred losses was explainable by overlap with existing underwriting variables or whether the credit score adds new information about losses not contained in existing underwriting variables. The results show that credit scores contain significant information not already incorporated into other traditional rating variables (e.g., age, sex, driving history). We discuss how sensation seeking and self-control theory provide a partial explanation of why credit scoring works (the psycho-social perspective). This article also presents an overview of biological and chemical correlates of risk taking that helps explain why knowing risk-taking behavior in one realm (e.g., risky financial behavior and poor credit history) transits to predicting risk-taking behavior in other realms (e.g., automobile insurance incurred losses). Additional research is needed to advance new nontraditional loss prediction variables from social media consumer information to using information provided by technological advances. The evolving and dynamic nature of the insurance marketplace makes it imperative that professionals continue to evolve predictive variables and for academics to assist with understanding the whys of the relationships through theory development.IC2 Institut

    Special Libraries, December 1928

    Get PDF
    Volume 19, Issue 10https://scholarworks.sjsu.edu/sla_sl_1928/1009/thumbnail.jp

    Automated data pre-processing via meta-learning

    Get PDF
    The final publication is available at link.springer.comA data mining algorithm may perform differently on datasets with different characteristics, e.g., it might perform better on a dataset with continuous attributes rather than with categorical attributes, or the other way around. As a matter of fact, a dataset usually needs to be pre-processed. Taking into account all the possible pre-processing operators, there exists a staggeringly large number of alternatives and nonexperienced users become overwhelmed. We show that this problem can be addressed by an automated approach, leveraging ideas from metalearning. Specifically, we consider a wide range of data pre-processing techniques and a set of data mining algorithms. For each data mining algorithm and selected dataset, we are able to predict the transformations that improve the result of the algorithm on the respective dataset. Our approach will help non-expert users to more effectively identify the transformations appropriate to their applications, and hence to achieve improved results.Peer ReviewedPostprint (published version
    • …
    corecore