thesis

ESSAYS ON ASSESSING METHODS FOR MODELLING THE DISTRIBUTION OF HEALTHCARE COSTS

Abstract

This thesis comprises three essays on assessing methods for modelling the distribution of healthcare costs. Chapter 2 extends the literature on modelling healthcare cost data by applying the generalised beta of the second kind (GB2) distribution to English hospital inpatient cost data. A quasi-experimental design, estimating models on a sub-population of the data and evaluating performance on another sub-population, is used to compare this distribution with its nested and limiting cases. While, for these data, the beta of the second kind (B2) distribution and generalised gamma (GG) distribution outperform the GB2, our results illustrate that the GB2 can be used as a device for choosing among competing parametric distributions for healthcare cost data. In Chapter 3, we conduct a quasi-Monte Carlo comparison of the recent developments in parametric and semi-parametric regression methods for healthcare costs, both against each other and against standard practice. The population of English NHS hospital inpatient episodes for the financial year 2007-2008 (summed for each patient: 6,164,114 observations in total) is randomly divided into two equally sized sub-populations to form an estimation set and a validation set. Evaluating out-of-sample using the validation set, a conditional density approximation estimator shows considerable promise in forecasting conditional means, performing best for accuracy of forecasting and amongst the best four (of sixteen compared) for bias and goodness-of-fit. The best performing model for bias is linear regression with square root transformed dependent variable, while a generalised linear model with square root link function and Poisson distribution performs best in terms of goodness-of-fit. Commonly used models utilising a log link are shown to perform badly relative to other models considered in our comparison. Chapter 4 examines methods for estimating the full conditional distribution of healthcare costs. Understanding the data generating process behind healthcare costs remains a key empirical issue. Although much research to date has focused on the prediction of the conditional mean cost, this can potentially miss important features of the full conditional distribution such as tail probabilities. We conduct a quasi-Monte Carlo experiment using English NHS inpatient data to compare 14 approaches to modelling the distribution of healthcare costs: nine of which are parametric, and have commonly been used to fit healthcare costs, and five others designed specifically to construct a counterfactual distribution. Our results indicate that no one method is clearly dominant and that there is a tradeoff between bias and precision of tail probability forecasts. We find that distributional methods demonstrate significant potential, particularly with larger sample sizes where the variability of predictions is reduced. Parametric distributions such as log-normal, generalised gamma and generalised beta of the second kind are found to estimate tail probabilities with high precision, but with varying bias depending upon the cost threshold being considered

    Similar works