Decision Trees: Predicting Future Losses for Insurance Data

Abstract

Big data is a term that has come to the spotlight for companies within recent years. Data analysis and business intelligence have become prominent sectors of companies and agencies. But what is big data? How has it impacted large companies and agencies? Why must it be embraced? The best way to approach utilizing a big data set is to establish a question to answer. For this data set, the question that must be answered is “What variables cause a loss to occur?” To answer this question, first, we must understand what is meant by a “loss”, and take a look at what kind of data we are working with. The data for this project is live, or active, insurance data from National Interstate Insurance. National Interstate Insurance offered this “live” data set for this project as a way to get a head start on statistical analysis. This data set has only been analyzed for this project presently, and will be visited by data analysts in the future for further assessment. The program used for this project is called SPSS. This tool is one of many used in companies to help draw up decision tree models to display data in an easy to navigate form. In this program, the decision trees are modeled by utilizing a feature that provides a few algorithmic options. These algorithms are known as CHAID and CART. Both algorithms result in some form of a decision tree displaying how variables impact the outcome

    Similar works