2,170,629 research outputs found

    Block-Conditional Missing at Random Models for Missing Data

    Full text link
    Two major ideas in the analysis of missing data are (a) the EM algorithm [Dempster, Laird and Rubin, J. Roy. Statist. Soc. Ser. B 39 (1977) 1--38] for maximum likelihood (ML) estimation, and (b) the formulation of models for the joint distribution of the data Z{Z} and missing data indicators M{M}, and associated "missing at random"; (MAR) condition under which a model for M{M} is unnecessary [Rubin, Biometrika 63 (1976) 581--592]. Most previous work has treated Z{Z} and M{M} as single blocks, yielding selection or pattern-mixture models depending on how their joint distribution is factorized. This paper explores "block-sequential"; models that interleave subsets of the variables and their missing data indicators, and then make parameter restrictions based on assumptions in each block. These include models that are not MAR. We examine a subclass of block-sequential models we call block-conditional MAR (BCMAR) models, and an associated block-monotone reduced likelihood strategy that typically yields consistent estimates by selectively discarding some data. Alternatively, full ML estimation can often be achieved via the EM algorithm. We examine in some detail BCMAR models for the case of two multinomially distributed categorical variables, and a two block structure where the first block is categorical and the second block arises from a (possibly multivariate) exponential family distribution.Comment: Published in at http://dx.doi.org/10.1214/10-STS344 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Testing Measurement Invariance with Ordinal Missing Data: A Comparison of Estimators and Missing Data Techniques

    Get PDF
    Ordinal missing data are common in measurement equivalence/invariance (ME/I) testing studies. However, there is a lack of guidance on the appropriate method to deal with ordinal missing data in ME/I testing. Five methods may be used to deal with ordinal missing data in ME/I testing, including the continuous full information maximum likelihood estimation method (FIML), continuous robust FIML (rFIML), FIML with probit links (pFIML), FIML with logit links (lFIML), and mean and variance adjusted weight least squared estimation method combined with pairwise deletion (WLSMV_PD). The current study evaluates the relative performance of these methods in producing valid chi-square difference tests (Δχ2) and accurate parameter estimates. The result suggests that all methods except for WLSMV_PD can reasonably control the type I error rates of (Δχ2) tests and maintain sufficient power to detect noninvariance in most conditions. Only pFIML and lFIML yield accurate factor loading estimates and standard errors across all the conditions. Recommendations are provided to researchers based on the results

    Clustering of Data with Missing Entries

    Full text link
    The analysis of large datasets is often complicated by the presence of missing entries, mainly because most of the current machine learning algorithms are designed to work with full data. The main focus of this work is to introduce a clustering algorithm, that will provide good clustering even in the presence of missing data. The proposed technique solves an â„“0\ell_0 fusion penalty based optimization problem to recover the clusters. We theoretically analyze the conditions needed for the successful recovery of the clusters. We also propose an algorithm to solve a relaxation of this problem using saturating non-convex fusion penalties. The method is demonstrated on simulated and real datasets, and is observed to perform well in the presence of large fractions of missing entries.Comment: arXiv admin note: substantial text overlap with arXiv:1709.0187
    • …
    corecore