272 research outputs found

    Using Supervised Machine Learning Methods for RFM Segmentation: A Casino Direct Marketing Communication Case

    Get PDF
    Purpose – This paper explores various supervised machine learning algorithms as an additional classification method to RFM (recency, frequency, and monetary) models with the aim of improving the accuracy in predicting target groups of customers for direct marketing response campaigns conducted by a casino. The purpose of this paper is twofold – first, to test how the addition of demographic variables increases the accuracy of the basic RFM model and second, to assess if and how machine learning algorithms improve the initial model. Ultimately, we propose a model for direct marketing response at individual level using RFM scores and customer demographic and behavioral data as endogenous variables to be used by the company. The findings can be used as an alternative to the simpler RFM model when approaching customer response modeling for large datasets and can be generalized to other industries. Design/Methodology/Approach – Our research employed supervised machine learning methods tuned on historical responses to a casino’s direct marketing activities to improve the company’s RFM segmentation model. Demographic variables were also included with the aim of improving the power of the models employed. Finally, we attempted to improve the best-performing model by hypertuning its algorithm parameters. Findings and Implications – The best and most intuitive model was found to be that using decision trees with Recency (from RFM) together with age and the awarded amount (from the demographic element) as independent variables. Surprisingly, the company’s own RFM segmentation was also found to perform well. Limitations – Not all machine learning methods used for classification were included in our research nor did we use ensemble methods to improve the models’ power. While all models developed are applicable to similar data, they could lose their accuracy when applied to data from a different industry. The company’s own RFM model was not analyzed but was included in the model as is. Further insight could be gained by determining its optimal parameters. Originality – This study contributes to the existing literature by showing how direct marketing efficiency modeling using standard RFM could be improved with the addition of a company’s customer property. It also provides insight into how classification algorithms perform on a casino database of direct marketing activities

    Optimization of Post-Scoring Classification and Impact on Regulatory Capital for Low Default Portfolios

    Get PDF
    After the crisis of 2008, new regulatory requirements have emerged with supervisors strengthening their position in terms of requirements to meet IRBA standards. Low Default Portfolios (LDP) present specific characteristics that raise challenges for banks when building and implementing credit risk models. In this context, where banks are looking to improve their Return On Equity and supervisors strengthening their positions, this paper aims to provide clues for optimizing Post-Scoring classification as well as analyzing the relationship between the number of classes in a rating scale and the impact on regulatory capital for LDPs

    Optimization of Post-Scoring Classification and Impact on Regulatory Capital for Low Default Portfolios

    Get PDF
    After the crisis of 2008, new regulatory requirements have emerged with supervisors strengthening their position in terms of requirements to meet IRBA standards. Low Default Portfolios (LDP) present specific characteristics that raise challenges for banks when building and implementing credit risk models. In this context, where banks are looking to improve their Return On Equity and supervisors strengthening their positions, this paper aims to provide clues for optimizing Post-Scoring classification as well as analyzing the relationship between the number of classes in a rating scale and the impact on regulatory capital for LDPs

    Development and Validation of Credit-Scoring Models

    Get PDF
    Accurate credit-granting decisions are crucial to the efficiency of the decentralized capital allocation mechanisms in modern market economies. Credit bureaus and many .nancial institutions have developed and used credit-scoring models to standardize and automate, to the extent possible, credit decisions. We build credit scoring models for bankcard markets using the Office of the Comptroller of the Currency, Risk Analysis Division (OCC/RAD) consumer credit database (CCDB). This unusu- ally rich data set allows us to evaluate a number of methods in common practice. We introduce, estimate, and validate our models, using both out-of-sample contempora- neous and future validation data sets. Model performance is compared using both separation and accuracy measures. A vendor-developed generic bureau-based score is also included in the model performance comparisons. Our results indicate that current industry practices, when carefully applied, can produce models that robustly rank-order potential borrowers both at the time of development and through the near future. However, these same methodologies are likely to fail when the the objective is to accurately estimate future rates of delinquency or probabilities of default for individual or groups of borrowers.

    High Expression of Caspase-8 Associated with Improved Survival in Diffuse Large B-Cell Lymphoma: Machine Learning and Artificial Neural Networks Analyses

    Get PDF
    High expression of the anti-apoptotic TNFAIP8 is associated with poor survival of the patients with diffuse large B-cell lymphoma (DLBCL), and one of the functions of TNFAIP8 is to inhibit the pro-apoptosis Caspase-8. We aimed to analyze the immunohistochemical expression of Caspase-8 (active subunit p18; CASP8) in a series of 97 cases of DLBCL from Tokai University Hospital, and to correlate with other Caspase-8 pathway-related markers, including cleaved Caspase-3, cleaved PARP, BCL2, TP53, MDM2, MYC, Ki67, E2F1, CDK6, MYB and LMO2. After digital image quantification, the correlation with several clinicopathological characteristics of the patients showed that high protein expression of Caspase-8 was associated with a favorable overall and progression-free survival (Hazard Risks = 0.3; p = 0.005 and 0.03, respectively). Caspase-8 also positively correlated with cCASP3, MDM2, E2F1, TNFAIP8, BCL2 and Ki67. Next, the Caspase-8 protein expression was modeled using predictive analytics, and a high overall predictive accuracy (>80%) was obtained with CHAID decision tree, Bayesian network, discriminant analysis, C5 tree, logistic regression, and Artificial Intelligence Neural Network methods (both Multilayer perceptron and Radial basis function); the most relevant markers were cCASP3, E2F1, TP53, cPARP, MDM2, BCL2 and TNFAIP8. Finally, the CASP8 gene expression was also successfully modeled in an independent DLBCL series of 414 cases from the Lymphoma/Leukemia Molecular Profiling Project (LLMPP). In conclusion, high protein expression of Caspase-8 is associated with a favorable prognosis of DLBCL. Predictive modeling is a feasible analytic strategy that results in a solution that can be understood (i.e., explainable artificial intelligence, “white-box” algorithms)

    CUSTOMER SEGMENTATION APPROACHES: A COMPARISON OF METHODS WITH DATA FROM THE MEDICARE HEALTH OUTCOMES SURVEY

    Get PDF
    Model-based segmentation approaches are particularly useful in healthcare consumer research, where the primary goal is to identify groups of individuals who share similar attitudinal and behavioral characteristics, in order to develop engagement strategies, create products, and allocates resources tailored to the specific needs of each segment group. Despite the growing research and literature on segmentation models, many healthcare researchers continue to use demographic variables only to classify consumers into groups; while failing to uncover unique patterns, relationships, and latent traits and relationships. The primary aim of this study was to 1) examine the differences in outcomes when classification methods (K-Means and LCA) for segmentation was used in conjunction with continuous and dichotomous scales; and 2) examine the differences in outcomes when prediction methods (CHAID and Neural Networks) for segmentation was used in conjunction with binary and continuous dependent variables and a variation of the classification algorithm. For the purpose of comparison across methods, data from the Medicare Health Outcome Survey was used in all conditions. Results indicated that the best segment class solution was dependent upon both the method and treatment of the inputs and dependent variable for both classification and prediction problems. When the input depression scale was dichotomized, the K-Means model yielded a 6 segment best-class-solution, whereas the LCA model yielded 9 distinct segment classes. On the other hand, LCA models yielded the same segment solution (9 classes), irrespective of the treatment of the depression scale. Similarly, differences in outcomes were identified when the dependent variable was continuous vs. binary when prediction models were used to segment survey respondents. When the outcome was dichotomous, CHAID models resulted in a 5-segment solution, compared to a 6-segment solution for Neural Networks. On the other hand, the binary dependent variable produced a 4-segment solution for both CHAID and Neural Network models. In addition, the interpretation of the segment class profiles is dependent upon both method and condition (input and treatment of dependent variable)

    Is Investing in Companies Manufacturing Solar Components a Lucrative Business? A Decision Tree Based Analysis

    Get PDF
    In an era of increasing energy production from renewable sources, the demand for components for renewable energy systems has dramatically increased. Consequently, managers and investors are interested in knowing whether a company associated with the semiconductor and related device manufacturing sector, especially the photovoltaic (PV) systems manufacturers, is a money-making business. We apply a new approach that extends prior research by applying decision trees (DTs) to identify ratios (i.e., indicators), which discriminate between companies within the sector that do (designated as “green”) and do not (“red”) produce elements of PV systems. Our results indicate that on the basis of selected ratios, green companies can be distinguished from the red companies without an in-depth analysis of the product portfolio. We also find that green companies, especially operating in China are characterized by lower financial performance, thus providing a negative (and unexpected) answer to the question posed in the title

    Data mining Twitter for cancer, diabetes, and asthma insights

    Get PDF
    Twitter may be a data resource to support healthcare research. Literature is still limited related to the potential of Twitter data as it relates to healthcare. The purpose of this study was to contrast the processes by which a large collection of unstructured disease-related tweets could be converted into structured data to be further analyzed. This was done with the objective of gaining insights into the content and behavioral patterns associated with disease-specific communications on Twitter. Twelve months of Twitter data related to cancer, diabetes, and asthma were collected to form a baseline dataset containing over 34 million tweets. As Twitter data in its raw form would have been difficult to manage, three separate data reduction methods were contrasted to identify a method to generate analysis files, maximizing classification precision and data retention. Each of the disease files were then run through a CHAID (chi-square automatic interaction detector) analysis to demonstrate how user behavior insights vary by disease. Chi-square Automatic Interaction Detector (CHAID) was a technique created by Gordon V. Kass in 1980. CHAID is a tool used to discover the relationship between variables. This study followed the standard CRISP-DM data mining approach and demonstrates how the practice of mining Twitter data fits into this six-stage iterative framework. The study produced insights that provide a new lens into the potential Twitter data has as a valuable healthcare data source as well as the nuances involved in working with the data
    • …
    corecore