Estimate-based goodness-of-fit test for large sparse multinomial distributions

Abstract

The Pearson's chi-squared statistic (X2) does not in general follow a chi-square distribution when it is used for goodness-of-fit testing for a multinomial distribution based on sparse contingency table data. We explore properties of [Zelterman, D., 1987. Goodness-of-fit tests for large sparse multinomial distributions. J. Amer. Statist. Assoc. 82 (398), 624-629] D2 statistic and compare them with those of X2 and compare the power of goodness-of-fit test among the tests using D2, X2, and the statistic (Lr) which is proposed by [Maydeu-Olivares, A., Joe, H., 2005. Limited- and full-information estimation and goodness-of-fit testing in 2n contingency tables: A unified framework. J. Amer. Statist. Assoc. 100 (471), 1009-1020] when the given contingency table is very sparse. We show that the variance of D2 is not larger than the variance of X2 under null hypotheses where all the cell probabilities are positive, that the distribution of D2 becomes more skewed as the multinomial distribution becomes more asymmetric and sparse, and that, as for the Lr statistic, the power of the goodness-of-fit testing depends on the models which are selected for the testing. A simulation experiment strongly recommends to use both D2 and Lr for goodness-of-fit testing with large sparse contingency table data.

    Similar works

    Full text

    thumbnail-image

    Available Versions

    Last time updated on 06/07/2012