Robust and Efficient Statistical Inference for Clustered Observational Data in Comparative Effectiveness Research

Abstract

Treatment allocations in observational studies are nonrandom and result in the confounding problem and potentially biase treatment effect estimates. Propensity score (PS) methods are commonly used in practice to address the confounding problem. Among different PS methods, PS regression is frequently used in clinical research. Even though the treatment effect estimate from the PS regression model is unbiased under the strongly ignorable treatment assignment assumption, the default variance estimate is biased. In the first topic of this dissertation, an improved variance estimator for the treatment effect estimate is proposed. Many observational data are clustered, for example, by physicians, and are therefore, not independent. A few PS methods consider correlated or clustered samples using mixed effects models with a strong normality assumption on the cluster effects. In the second part of this dissertation, a robust semi-nonparametric propensity score (SNP-PS) regression model is proposed. We relax the normality assumption and model the complex heterogeneity structure in treatment allocation process nonparametrically. The proposed SNP-PS model is robust and provides unbiased treatment effect estimates while parametric mixed effects PS models fail to do so when the cluster effects are non-normally distributed. We establish the asymptotic result for the treatment effect estimate and propose an unbiased variance estimator for it. Computationally, we propose an adaptive quadrature integration EM (expectation-maximization) algorithm to avoid potential large Monte Carlo errors of existing Monte Carlo EM algorithms. Many real world medical record data are not only clustered but also multilevel clustered with millions of samples and hundreds of thousands of clusters. The SNP-PS framework is in theory applicable to these large datasets. However, in practice, it is computationally prohibited. In the third topic of this dissertation, we propose a flexible mixed effects PS model (FM-PS) that is computationally efficient for large multilevel clustered data. The FM-PS model relaxes a critical independence assumption that the random effects are independent of the fixed effect covariates made in the standard mixed effects PS (SM-PS) models. The FM-PS model provides an unbiased treatment effect estimate regardless whether the independence assumption holds or not. Though the treatment effect estimate from the SM-PS model is biased when the independence assumption does not hold, it is unbiased and more efficient than the estimate from the FM-PS model when the independence assumption holds. We propose a likelihood ratio statistics for testing the independence assumption which allows us to choose between the FM-PS and SM-PS models. A cluster bootstrapping procedure to estimate the variance of treatment effect estimate is proposed. The FM-PS model is robust to various model misspecifications as demonstrated by our extensive simulations.Doctor of Philosoph

    Similar works