LEVERAGING MACHINE LEARNING TO IDENTIFY QUALITY ISSUES IN THE MEDICAID CLAIM ADJUDICATION PROCESS

Abstract

Medicaid is the largest health insurance in the U.S. It provides health coverage to over 68 million individuals, costs the nation over $600 billion a year, and subject to improper payments (fraud, waste, and abuse) or inaccurate payments (claim processed erroneously). Medicaid programs partially use Fee-For-Services (FFS) to provide coverage to beneficiaries by adjudicating claims and leveraging traditional inferential statistics to verify the quality of adjudicated claims. These quality methods only provide an interval estimate of the quality errors and are incapable of detecting most claim adjudication errors, potentially millions of dollar opportunity costs. This dissertation studied a method of applying supervised learning to detect erroneous payment in the entire population of adjudicated claims in each Medicaid Management Information System (MMIS), focusing on two specific claim types: inpatient and outpatient. A synthesized source of adjudicated claims generated by the Centers for Medicare & Medicaid Services (CMS) was used to create the original dataset. Quality reports from California FFS Medicaid were used to extract the underlying statistical pattern of claim adjudication errors in each Medicaid FFS and data labeling utilizing the goodness of fit and Anderson-Darling tests. Principle Component Analysis (PCA) and business knowledge were applied for dimensionality reduction resulting in the selection of sixteen (16) features for the outpatient and nineteen (19) features for the inpatient claims models. Ten (10) supervised learning algorithms were trained and tested on the labeled data: Decision tree with two configurations - Entropy and Gini, Random forests with two configurations - Entropy and Gini, Naïve Bayes, K Nearest Neighbor, Logistic Regression, Neural Network, Discriminant Analysis, and Gradient Boosting. Five (5) cross-validation and event-based sampling were applied during the training process (with oversampling using SMOTE method and stratification within oversampling). The prediction power (Gini importance) for the selected features were measured using the Mean Decrease in Impurity (MDI) method across three algorithms. A one-way ANOVA and Tukey and Fisher LSD pairwise comparisons were conducted. Results show that the Claim Payment Amount significantly outperforms the rest of the prediction power (highest Mean F-value for Gini importance at the α = 0.05 significance) for both claim types. Finally, all algorithms' recall and F1-score were measured for both claim types (inpatient and outpatient) and with and without oversampling. A one-way ANOVA and Tukey and Fisher LSD pairwise comparisons were conducted. The results show a statistically significant difference in the algorithm's performance in detecting quality issues in the outpatient and inpatient claims. Gradient Boosting, Decision Tree (with various configurations and sampling strategies) outperform the rest of the algorithms in recall and F1-measure on both datasets. Logistic Regression showing better recall on the outpatient than inpatient data, and Naïve Bays performs considerably better from recall and F1- score on outpatient data. Medicaid FFS programs and consultants, Medicaid administrators, and researchers could use this study to develop machine learning models to detect quality issues in the Medicaid FFS claim datasets at scale, saving potentially millions of dollars

    Similar works