Approaches to Modelling Heterogeneity in Longitudinal Studies


This thesis is about estimation bias of longitudinal data when there is correlation between the explanatory variable and the individual effect. In our study, we firstly introduce what is longitudinal data, then we introduce the commonly used estimation methods for the general linear model: the least squares method and maximum likelihood method. We apply these estimation methods to three simple general models which are commonly used to analyse longitudinal data. Secondly, we use frequentist and Bayesian analysis to explore the estimation bias theoretically and empirically, with an emphasis on the heterogeneity bias. This bias occurs where random effect estimation is used to analyse data with nonzero correlation between explanatory variables and the individual effect. We then empirically compare the estimated value with the true value. In this way, we demonstrate and verify the theoretical formulation which can be used to determine the size of the bias [Mundlak, 1978]. In order to avoid the estimation bias, the fixed effect estimation should be used to get the better solution under nonzero correlation situation. The Hausman test is used to confirm this. However, the bias not only occurs when we use frequentist analysis, but also exist by using the Bayesian estimation of random effect model. Finally, we follow the Mundlak [1978] idea, then define the special Bayesian model which can be used as Hausman test and as a comparable model. We also prove that it is best fit model among the random effect, fixed effect and pooled model if there is correlation between explanatory variables and individual effect. Throughout this thesis, we illustrate this ideas using examples based on real and simulated data

    Similar works