37 research outputs found

    Predicting Random Effects from Finite Population Clustered Samples with Response Error

    No full text
    ABSTRACT In many situations there is interest in parameters (e.g. mean) associated with the response distribution of individual clusters in a finite clustered population. We develop predictors of such parameters using a two-stage sampling probability model with response error. The probability model stems directly from finite population sampling, without additional assumptions. The predictors are closely related to best linear unbiased predictors (BLUP) that arise from common mixed model methods, as well as to modelbased predictors obtained via super-population approaches for survey sampling. The context assumes clusters of equal size and equal size sampling of units within clusters. Target parameters may correspond to clusters realized in the sample, as well as non-realized clusters. In either case, the predictors are linear and unbiased, and minimize the expected mean squared error. They correspond to the sum of predictors of responses for realized and non-realized units in the cluster, accounting directly for the second stage sampling fraction. In contrast, the BLUP commonly used in mixed models can be interpreted as predicting only the responses of second stage units not observed for a cluster, not the cluster mean. The development reveals that two-stage sampling does not give rise to a more general variance structure often assumed in super-population models, even when variances within clusters are heterogeneous. The proposed model is design based and requires minimal assumptions. With response error present, we predict target random variables defined as an expected (or average) response over units in a cluster

    Predicting Random Effects from Finite Population Clustered Samples with Response Error

    No full text
    ABSTRACT In many situations there is interest in a parameter representing the mean of an individual cluster in a finite clustered population. We develop predictors of such parameters using a two-stage sampling probability model with response error. The probability model arises directly from finite population sampling, without additional Target parameters may correspond to clusters realized in the sample, as well as non-realized clusters. In either case, the predictors are linear and unbiased, and minimize the expected mean squared error. The predictor is the sum of predictors for realized and non-realized units in the cluster, accounting directly for the second stage sampling fraction. In contrast, the commonly used BLUP in a mixed model can be seen to predict only the responses of non-realized second stage units for a cluster, not the cluster mean. The development reveals that two-stage sampling does not give rise to a more general variance structure often assumed in super-population models, even when variances within clusters are heterogeneous. The predictors provide an interpretable alternative to an apparently artificial model-based approach. With response error present, we predict target random variables defined as an average over units in a cluster of response, or the expected value of response

    Daily Soil Ingestion Estimates for Children at a Superfund Site

    Get PDF
    Ingestion of contaminated soil by children may result in significant exposure to toxic substances at contaminated sites. Estimates of such exposure are based on extrapolation of short-term-exposure estimates to longer time periods. This article provides daily estimates of soil ingestion on 64 children between the ages of 1 and 4 residing at a Superfund site; these values are employed to estimate the distribution of 7-day average soil ingestion exposures (mean, 31 mg/day; median, 17 mg/day) at a contaminated site over different time periods. Best linear unbiased predictors of the 95th-percentile of soil ingestion over 7 days, 30 days, 90 days, and 365 days are 133 mg/day, 112 mg/day, 108 mg/day and 106 mg/day, respectively. Variance components estimates (excluding titanium and outliers, based on Tukey's far-out criteria) are given for soil ingestion between subjects (59 mg/day) 2 , between days on a subject (95 mg/day) 2 , and for uncertainty on a subject-day (132 mg/day) 2 . These results expand knowledge of potential exposure to contaminants among young children from soil ingestion at contaminated sites. They also provide basic distributions that serve as a starting point for use in Monte Carlo risk assessments. KEY WORDS: Soil ingestion; Monte Carlo risk assessment; children; Superfund site; exposure assessment hand-to-mouth behavior among young children. (1-4

    Predicting random effects with an expanded finite population mixed model

    No full text
    Prediction of random effects is an important problem with expanding applications. In the simplest context, the problem corresponds to prediction of the latent value (the mean) of a realized cluster selected via two-stage sampling. Recently, Stanek and Singer [Predicting random effects from finite population clustered samples with response error. J. Amer. Statist. Assoc. 99, 119-130] developed best linear unbiased predictors (BLUP) under a finite population mixed model that outperform BLUPs from mixed models and superpopulation models. Their setup, however, does not allow for unequally sized clusters. To overcome this drawback, we consider an expanded finite population mixed model based on a larger set of random variables that span a higher dimensional space than those typically applied to such problems. We show that BLUPs for linear combinations of the realized cluster means derived under such a model have considerably smaller mean squared error (MSE) than those obtained from mixed models, superpopulation models, and finite population mixed models. We motivate our general approach by an example developed for two-stage cluster sampling and show that it faithfully captures the stochastic aspects of sampling in the problem. We also consider simulation studies to illustrate the increased accuracy of the BLUP obtained under the expanded finite population mixed model. (C) 2007 Elsevier B.V. All rights reserved

    Design-based random permutation models with auxiliary information

    No full text
    We extend the random permutation model to obtain the best linear unbiased estimator of a finite population mean accounting for auxiliary variables under simple random sampling without replacement (SRS) or stratified SRS. The proposed method provides a systematic design-based justification for well-known results involving common estimators derived under minimal assumptions that do not require specification of a functional relationship between the response and the auxiliary variables.National Institutes of Health, USA [NIH-PHS-R01-HD36848, R01-HL071828-02, 5R01HL079483]National Institutes of Health, USAConselho Nacional de Desenvolvimento Cientifico e Tecnologico (CNPq)Conselho Nacional de Desenvolvimento Cientifico e Tecnologico (CNPq)Fundacao de Amparo a Pesquisa do Estado de Sao Paulo (FAPESP), BrazilFundacao de Amparo a Pesquisa do Estado de Sao Paulo (FAPESP), Brazi

    Performance of balanced two-stage empirical predictors of realized cluster latent values from finite populations: A simulation study

    No full text
    Predictors of random effects are usually based on the popular mixed effects (ME) model developed under the assumption that the sample is obtained from a conceptual infinite population; such predictors are employed even when the actual population is finite. Two alternatives that incorporate the finite nature of the population are obtained from the superpopulation model proposed by Scott and Smith (1969. Estimation in multi-stage surveys. J. Amer. Statist. Assoc. 64, 830-840) or from the finite population mixed model recently proposed by Stanek and Singer (2004. Predicting random effects from finite population clustered samples with response error. J. Amer. Statist. Assoc. 99, 1119-1130). Predictors derived under the latter model with the additional assumptions that all variance components are known and that within-cluster variances are equal have smaller mean squared error (MSE) than the competitors based on either the ME or Scott and Smith`s models. As population variances are rarely known, we propose method of moment estimators to obtain empirical predictors and conduct a simulation study to evaluate their performance. The results suggest that the finite population mixed model empirical predictor is more stable than its competitors since, in terms of MSE, it is either the best or the second best and when second best, its performance lies within acceptable limits. When both cluster and unit intra-class correlation coefficients are very high (e.g., 0.95 or more), the performance of the empirical predictors derived under the three models is similar. (c) 2007 Elsevier B.V. All rights reserved

    Simple random Sampling with Missing Data Estimating the Population Mean From a Simple Random Sample When Some Responses are Missing

    No full text
    Abstract We develop a design-based prediction approach to estimate the finite population mean in a simple setting where some responses are missing. The approach is based on indicator sampling random variables that operate on labeled units (subjects). Missing data mechanisms are defined that may depend on a subject, or on a selection (such as when the study design assigns groups of selected subjects to different interviewers). Using an approach usually reserved for model-based inference, we develop a predictor that equals the sample total divided by the expected sample size. The methods are direct extensions of best linear unbiased prediction (BLUP) in finite population mixed models. When the probability of missing is estimated from the sample, the empirical estimator simplifies to the mean of the realized non-missing responses. The different missing data mechanisms are revealed by the notation that accounts for the labels and sample selections. The mean squared error (MSE) of the empirical estimator, counterintuitively, is smaller than the MSE if the probability of missing is known
    corecore