research

Estimating the variance of the Horvitz-Thompson estimator

Abstract

Unequal probability sampling was introduced by Hansen and Hurwitz (1943) as a means of reducing the mean squared errors of survey estimators. For simplicity they used sampling with replacement only. Horvitz and Thompson (1952) extended this methodology to sampling without replacement, however the knowledge of the joint inclusion probabilities of all pairs of sample units was required for the variance estimation process. The calculation of these probabilities is typically problematic. Sen (1953) and Yates and Grundy (1953) independently suggested the use fixed, but this estimator again involved the calculation of the joint inclusion probabilities. This requirement has proved to be a substantial disincentive to its use. More recently, efforts have been made to find useful approximations to this fixed-size sample variance, which would avoid the need to evaluate the joint inclusion probabilities. These approximate variance estimators have been shown to perform well under high entropy sampling designs, however, there is now an ongoing dispute in the literature regarding the preferred approximate estimator. This thesis examines in detail nine of these approximate estimators, and their empirical performances under two high entropy sampling designs, namely Conditional Poisson Sampling and Randomised Systematic Sampling. These nine approximate estimators were separated into two families based on their variance formulae. It was hypothesised, due to the derivation of these variance estimators, that one family would perform better under Randomised Systematic Sampling and the other under Conditional Poisson Sampling. The two families of approximate variance estimators showed within group similarities, and they usually performed better under their respective sampling designs. Recently algorithms have been derived to efficiently determine the exact joint inclusion probabilities under Conditional Poisson Sampling. As a result, this study compared the Sen-Yates-Grundy variance estimator to the other approximate estimators to determine whether the knowledge of these probabilities could improve the estimation process. This estimator was found to avoid serious inaccuracies more consistently than the nine approximate estimators, but perhaps not to the extent that would justify its routine use, as it also produced estimates of variance with consistently higher mean squared errors than the approximate variance estimators. The results of the more recent published papers, Matei and Till´e (2005), have been shown to be largely misleading. This study also shows that the relationship between the variance and the entropy of the sampling design is more complex than was originally supposed by Brewer and Donadio (2003). Finally, the search for a best all-round variance estimator has been somewhat inconclusive, but it has been possible to indicate which estimators are likely to perform well in certain definable circumstances

    Similar works