Inference and applications in hierarchical linear models with missing data.

Abstract

The development of model-based methods for missing data has been a seminal contribution to statistical inference and data analysis (Orchard and Woodbury 1972; Rubin 1976; Dempster, Laird and Rubin 1977; Rubin 1987; Schafer 1997; Little and Rubin 2002). These methods apply when observations are independently distributed. This paper extends the model-based methods to two-level data where the observations within each cluster are dependent. When such data are complete, analysis using a hierarchical linear model (also known as a multilevel linear model or a random coefficient model) proceeds using maximum likelihood (Dempster, Laird and Rubin 1977; Dempster, Rubin, and Tsutakawa 1981; Laird and Ware 1982; Longford 1993; Goldstein 1995; Schafer 1997; Pinheiro and Bates 2000; Little and Rubin 2002; Raudenbush and Bryk 2002) or Bayes methods (Lindley and Smith 1972; Carlin and Louis 1996; Gelman, Carlin, Stern and Rubin 1997; Schafer 1997; Little and Rubin 2002). The key assumptions are that the data at the within-cluster or cluster level, or both, are missing at random (MAR); that parameter spaces for the complete data model and missing data mechanism are distinct (Rubin 1976); and that the data subject to missingness are multivariate normal conditional on all observed data. We maximize the observed data likelihood via the EM algorithm (Dempster, Laird and Rubin 1977; Wu 1993) or a mixture of EM algorithm and Fisher scoring (Laird and Ware 1982; Longford 1987) to obtain the maximum likelihood (ML) estimates of the parameters of interest using all available data. We consider a general missing data pattern via an observed-value indicator matrix. Applications include regression of a subset of complete data on a disjoint subset, a random-coefficients model, multiple model-based imputation, a simultaneous-equations model, a contextual-effects model, and a level-2 response model. We illustrate these applications using national survey data on US high schools and simulated data sets.Ph.D.Pure SciencesStatisticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/123703/2/3096200.pd

    Similar works

    Full text

    thumbnail-image