298 research outputs found

    Robust Likelihood-based Analysis of Multivariate Data with Missing Values

    Get PDF
    The model-based approach to inference from multivariate data with missing values is reviewed. Regression prediction is most useful when the covariates are predictive of the missing values and the probability of being missing, and in these circumstances predictions are particularly sensitive to model misspecification. The use of penalized splines of the propensity score is proposed to yield robust model-based inference under the missing at random (MAR) assumption, assuming monotone missing data. Simulation comparisons with other methods suggest that the method works well in a wide range of populations, with little loss of efficiency relative to parametric models when the latter are correct. Extensions to more general patterns are outlined

    Calibrated Bayes, for Statistics in General, and Missing Data in Particular

    Full text link
    It is argued that the Calibrated Bayesian (CB) approach to statistical inference capitalizes on the strength of Bayesian and frequentist approaches to statistical inference. In the CB approach, inferences under a particular model are Bayesian, but frequentist methods are useful for model development and model checking. In this article the CB approach is outlined. Bayesian methods for missing data are then reviewed from a CB perspective. The basic theory of the Bayesian approach, and the closely related technique of multiple imputation, is described. Then applications of the Bayesian approach to normal models are described, both for monotone and nonmonotone missing data patterns. Sequential Regression Multivariate Imputation and Penalized Spline of Propensity Models are presented as two useful approaches for relaxing distributional assumptions.Comment: Published in at http://dx.doi.org/10.1214/10-STS318 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Extensions of the Penalized Spline of Propensity Prediction Method of Imputation

    Full text link
    Little and An (2004,  Statistica Sinica   14, 949–968) proposed a penalized spline of propensity prediction (PSPP) method of imputation of missing values that yields robust model-based inference under the missing at random assumption. The propensity score for a missing variable is estimated and a regression model is fitted that includes the spline of the estimated logit propensity score as a covariate. The predicted unconditional mean of the missing variable has a double robustness (DR) property under misspecification of the imputation model. We show that a simplified version of PSPP, which does not center other regressors prior to including them in the prediction model, also has the DR property. We also propose two extensions of PSPP, namely, stratified PSPP and bivariate PSPP, that extend the DR property to inferences about conditional means. These extended PSPP methods are compared with the PSPP method and simple alternatives in a simulation study and applied to an online weight loss study conducted by Kaiser Permanente.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/65192/1/j.1541-0420.2008.01155.x.pd

    Robust Methods for Estimating the Mean with Missing Data.

    Full text link
    Missing data are common in many empirical studies. In this dissertation, we explore robust methods to estimate the mean of an outcome variable subject to nonresponse in the presence of fully observed covariates. In Chapter II, we consider data on a continuous outcome that are missing at random and a fully observed set of covariates. Doubly-robust (DR) estimators are consistent when either the regression model for the mean function or the propensity to respond (the “propensity model”) is correctly specified. We compare by simulation a variety of doubly-robust (DR) estimators for estimating the mean of the outcome. Penalized spline of propensity prediction (Zhang and Little 2009) and the augmented estimating equation method proposed in Cao, et al (2009) tended to outperform the other DR methods. In Chapter III, we consider estimating the mean of a continuous outcome that may be missing not at random (MNAR). Bivariate normal pattern-mixture models (BNPM; Little 1994) and proxy-pattern mixture models (PPMA; Andridge and Little 2011) have been proposed to estimate the mean of the outcome under varying assumptions about the missing data mechanism given one or more fully observed covariates. Both BNPM and PPMA assume normality. We propose a spline-proxy pattern mixture model (S-PPMA), which relaxes the normality assumption using a penalized spline. Properties of S-PPMA are assessed by simulation. Results show that S-PPMA yields estimates that are more robust than PPMA to deviations from normality, while trading off precision when normality assumption is met. In Chapter IV, we extend S-PPMA to a binary outcome (binS-PPMA), by assuming an underlying continuous latent variable that generates the binary outcome. We apply binS-PPMA to the latent variable to impute the binary outcome given information from observed covariates and assumptions about the nonresponse mechanism. As with continuous missing variables, binS-PPMA shows improvement in robustness to normality compared to bin-PPMA, with no important differences between the methods when all variables are normally distributed.PhDBiostatisticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/113402/1/yeya_1.pd

    Robust Methods for Causal Inference Using Penalized Splines

    Full text link
    Observational studies are important for evaluating treatment effects, especially when randomization of treatments is unethical or expensive. Without randomization, valid inferences about treatment effects can only be drawn by controlling for confounders. Propensity scores (PS) -- the probability of treatment assignment as a function of covariates -- are often used to control for confounders. PS-based methods are vulnerable to bias and inefficiency when outcome or propensity score models are misspecified or there is limited overlap in the propensity score distributions between treatment groups. In this dissertation, we develop new robust methods for estimating causal effects from observational studies and address two closely related topics on causal inference -- the problem of limited overlap and variable selection for propensity score model. In Chapter 2, we propose a robust multiple imputation based approach to causal inference called Penalized Spline of Propensity Methods for Treatment Comparison (PENCOMP). PENCOMP estimates causal effects by imputing missing potential outcomes with flexible spline models, and draws inference based on imputed and observed outcomes. Under the standard causal inference assumptions, PENCOMP is doubly robust, that is, yields consistent estimates of causal effects if either the propensity or the outcome model is correctly specified. Simulations suggest that it tends to outperform doubly-robust marginal structural modeling, especially when the weights are highly variable. We apply our method to the Multicenter AIDS Cohort study (MACS) to estimate the short term effect of antiretroviral treatment on CD4 counts in HIV+ patients. In Chapter 3, we address the issue of limited overlap in the propensity score distributions across treatment groups. We investigate appropriate restrictions of the causal estimand, and compare alternative estimation methods, including various simple and augmented inverse propensity weighting approaches, matching and PENCOMP. We demonstrate the flexibility of PENCOMP for estimating different estimands. We apply these methods to the MACS dataset to estimate the effects of antiretroviral treatment on CD4 counts in HIV+ patients. In Chapter 4, we consider variable selection techniques that seek to restrict predictors in the propensity model to true confounders, thus improving overlap in the propensity distributions and increasing efficiency. We also propose a new version of PENCOMP via bagging that incorporates the variability of model selection, which can be advantageous when the data are noisy. We examine by simulation studies the impact of various variable selection techniques, including an extension of the adaptive lasso, on inferences from PENCOMP and weighting approaches. We demonstrate our methods and variable selection techniques using the MACS dataset.PHDBiostatisticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/147507/1/tkzhou_1.pd

    Extensions of the Penalized Spline Propensity Prediction Method of Imputation.

    Full text link
    Little and An (2004) proposed a penalized spline propensity prediction (PSPP) method of imputation of missing values that yields robust model-based inference under the missing at random assumption. The propensity score for a missing variable is estimated and a regression model is fit with the spline of the propensity score as a covariate. The predicted marginal mean of the missing variable is doubly robust (DR) under the misspecification of the imputation model. In the first part of the thesis, we study properties of a simplified version of the PSPP that does not center the regressors prior to including them in the prediction model. We then extend the PSPP to multivariate data so as to yield consistent estimates of both marginal and conditional means. The extended PSPP method is compared with the PSPP method and simple alternatives in a simulation study. For the second part of the thesis, we compare the PSPP method with several other DR estimators. The PSPP method uses a spline of propensity score to impute the missing values and the resulting estimates have a double robustness property. The DR property can also be achieved by modeling the relationship parametrically, such as the linear in the weight method and calibration method (Firth and Bennett, 1998; Robins, Rotnitzky and Zhao, 1994). We compare root mean square error (RMSE), width of confidence interval and non-coverage rate of these methods under different mean functions and propensity score functions. We also study the effects of sample size and misspecification of the propensity scores. The PSPP method yields estimates with smaller RMSE and width of confidence interval compared with other methods under most situations. It yields estimates with non-coverage rates close to the 5% nominal level. For the third part of the thesis, we extend the PSPP methods to the monotone missing data. We propose to impute the missing values based on a stepwise PSPP procedure. We illustrate the proposed method by applying it to an online weight loss study conducted by Kaiser Permanente. We finish the thesis with a short discussion and future work.Ph.D.BiostatisticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/57686/2/guangyuz_1.pd

    Novel Applications and Extensions for Bayesian Additive Regression Trees (BART) in Prediction, Imputation, and Causal Inference

    Full text link
    The Bayesian additive regression trees (BART) is a method proposed by Chipman et al. (2010) that can handle non-linear main and multiple-way interaction effects for independent continuous or binary outcomes. It has enjoyed much success in areas like causal inference, economics, environmental sciences, and genomics. However, extensions of BART and application of these extensions are limited. This thesis discusses three novel applications and extensions for BART. We first discuss how BART can be extended to clustered outcomes by adding a random intercept. This work was motivated by the need to accurately predict driver behavior using observable speed and location information with application to communication of key human-driver intention to nearby vehicles in traffic. Although our extension can be considered a special case of the spatial BART (Zhang et al., 2007), our approach differs by providing a relatively simple algorithm that allows application to clustered binary outcomes. We next focus on the use of BART in missing data settings. Doubly robust (DR) methods allow consistent estimation of population means when either non-response propensity or modeling of the mean of the outcome is correctly specified. Kang and Schafer (2007) showed that DR methods produce biased and inefficient estimates when both propensity and mean models are misspecified. We consider the use of BART for modeling means and/or propensities to provide a ``robust-squared'' estimator that reduces bias and improves efficiency. We demonstrate this result, using simulations, for the two commonly used DR methods: Augmented Inverse Probability Weighting (AIPWT, Robbins et al., 1994) and penalized splines of propensity prediction (PSPP, Zhang and Little, 2009). We successfully applied our proposed model to two national crash datasets to impute missing change in deceleration values (delta-v) and missing Blood Alcohol Concentration (BAC) levels respectively. Our final effort considers how a negative wealth shock (sudden large decline in wealth) affects the cognitive outcome of late middle aged US adults using the Health Retirement Study, a longitudinal study of US adults, enrolled at age 50 and older and surveyed biennially since 1992. Our analysis faced three issues: lack of randomization, confounding by indication, and censoring of the cognitive outcome by a substantial number of deaths in our subjects. Marginal structural models (MSM), a commonly used method to deal with censoring by death, is arguably inappropriate because it upweights subjects who are more likely to die, creating a pseudo-population which resembles one where death is absent. We propose to compare the negative wealth shock effect only among subjects who survived under both sets of treatment regimens - a special case of principal stratification (Frangakis and Rubin, 2002). Because the counterfactual survival status would be unobserved, we imputed their survival status and restrict analysis to subjects who were observed and predicted to survive under both treatment regimes. We used a modified version of penalized spline of propensity methods in treatment comparisons (PENCOMP, Zhou et. al, 2018) to obtain a robust imputation of the counterfactual cognitive outcomes. Finally, we consider several possible extensions of these efforts for future work.PHDBiostatisticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/147594/1/vincetan_1.pd
    • …
    corecore