59 research outputs found

    The statistical analysis of index variables containing missing data.

    Full text link
    Consider a data set with several polytomous variables that measure the same underlying trait. Assume that some of these variables contain missing data. Suppose interest is in the regression of a response on the sum, or index, of these variables. We cannot analyze the data directly because the index contains missing values. Simple methods of handling the missing data problem include complete-case analysis, which uses only cases with complete data, and conditional mean imputation, which fills in the missing data using best linear predictions. These methods can lead to biased estimates of regression coefficients and underestimation of the standard error. I addressed this missing data problem using item response theory (IRT) from educational testing literature, which models the probability of a subject answering a test item correctly given the subject's latent ability. A particular model, the partial credit model (PCM), was used to model the ordinal rating-scale data. The partial credit model gives the probability of the categories for each rating variable, given the latent trait. The PCM contains a separate difficulty parameter for each variable and common threshold parameters over all variables. The threshold parameters separate response categories on the latent continuum. The response in the regression model and the latent trait variable were assumed to have a bivariate normal distribution. The marginal distribution of the latent trait variable was assumed to be standard normal. The PCM was used to develop a multiple imputation procedure for addressing the missing data problem. Multiple imputation is a method in which several draws for each missing value are obtained from the predictive distribution of the missing values. Estimates of the regression coefficient from each of the filled-in datasets are then combined in such a way to obtain a consistent estimate of the regression coefficient and propagate the imputation error and improve the precision of the resulting estimate. A maximum likelihood method using an ECM algorithm with the PCM was first developed. A Bayesian method, using a Gibbs' sampling algorithm that incorporated Griddy Gibbs' sampling and rejection sampling was then developed. Multiple imputation was utilized for the Bayesian method. A simulation study was used to compare the two model-based methods with each other and with existing methods for this type of missing data problem. The results of the study indicate that the multiple imputation Gibbs' sampling algorithm based on the PCM was superior to the other methods in terms of bias, RMSE, and percent coverage of confidence intervals. Conditional mean imputation was nearly as good, however regression coefficients were biased for high correlations and the uncertainty in the imputation process is not accounted for with conditional mean imputation.Ph.D.Biological SciencesBiostatisticsPure SciencesStatisticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/131911/2/9938455.pd
    • …
    corecore