3,369 research outputs found

    Integration of survey data and big observational data for finite population inference using mass imputation

    Get PDF
    Multiple data sources are becoming increasingly available for statistical analyses in the era of big data. As an important example in finite-population inference, we consider an imputation approach to combining a probability sample with big observational data. Unlike the usual imputation for missing data analysis, we create imputed values for the whole elements in the probability sample. Such mass imputation is attractive in the context of survey data integration (Kim and Rao, 2012). We extend mass imputation as a tool for data integration of survey data and big non-survey data. The mass imputation methods and their statistical properties are presented. The matching estimator of Rivers (2007) is also covered as a special case. Variance estimation with mass-imputed data is discussed. The simulation results demonstrate the proposed estimators outperform existing competitors in terms of robustness and efficiency

    Semiparametric response model with nonignorable nonresponse

    Get PDF
    How to deal with nonignorable response is often a challenging problem encountered in statistical analysis with missing data. Parametric model assumption for the response mechanism is often made and there is no way to validate the model assumption with missing data. We consider a semiparametric response model that relaxes the parametric model assumption in the response mechanism. Two types of efficient estimators, profile maximum likelihood estimator and profile calibration estimator, are proposed and their asymptotic properties are investigated. Two extensive simulation studies are used to compare with some existing methods. We present an application of our method using Korean Labor and Income Panel Survey data

    Fractional Imputation in Survey Sampling: A Comparative Review

    Get PDF
    Fractional imputation (FI) is a relatively new method of imputation for handling item nonresponse in survey sampling. In FI, several imputed values with their fractional weights are created for each missing item. Each fractional weight represents the conditional probability of the imputed value given the observed data, and the parameters in the conditional probabilities are often computed by an iterative method such as EM algorithm. The underlying model for FI can be fully parametric, semiparametric, or nonparametric, depending on plausibility of assumptions and the data structure. In this paper, we give an overview of FI, introduce key ideas and methods to readers who are new to the FI literature, and highlight some new development. We also provide guidance on practical implementation of FI and valid inferential tools after imputation. We demonstrate the empirical performance of FI with respect to multiple imputation using a pseudo finite population generated from a sample in Monthly Retail Trade Survey in US Census Bureau.Comment: 26 pages, 2 figure

    Predictive mean matching imputation in survey sampling

    Get PDF
    Predictive mean matching imputation is popular for handling item nonresponse in survey sampling. In this article, we study the asymptotic properties of the predictive mean matching estimator of the population mean. For variance estimation, the conventional bootstrap inference for matching estimators with fixed matches has been shown to be invalid due to the nonsmoothness nature of the matching estimator. We propose asymptotically valid replication variance estimation. The key strategy is to construct replicates of the estimator directly based on linear terms, instead of individual records of variables. Extension to nearest neighbor imputation is also discussed. A simulation study confirms that the new procedure provides valid variance estimation.Comment: 20 pages, 0 figure, 1 tabl

    A note on multiple imputation for method of moments estimation

    Get PDF
    Multiple imputation is a popular imputation method for general purpose estimation. Rubin(1987) provided an easily applicable formula for the variance estimation of multiple imputation. However, the validity of the multiple imputation inference requires the congeniality condition of Meng(1994), which is not necessarily satisfied for method of moments estimation. This paper presents the asymptotic bias of Rubin's variance estimator when the method of moments estimator is used as a complete-sample estimator in the multiple imputation procedure. A new variance estimator based on over-imputation is proposed to provide asymptotically valid inference for method of moments estimation.Comment: 8 pages, 0 figur

    Bayesian Sparse Propensity Score Estimation for Unit Nonresponse

    Get PDF
    Nonresponse weighting adjustment using propensity score is a popular method for handling unit nonresponse. However, including all available auxiliary variables into the propensity model can lead to inefficient and inconsistent estimation, especially with high-dimensional covariates. In this paper, a new Bayesian method using the Spike-and-Slab prior is proposed for sparse propensity score estimation. The proposed method is not based on any model assumption on the outcome variable and is computationally efficient. Instead of doing model selection and parameter estimation separately as in many frequentist methods, the proposed method simultaneously selects the sparse response probability model and provides consistent parameter estimation. Some asymptotic properties of the proposed method are presented. The efficiency of this sparse propensity score estimator is further improved by incorporating related auxiliary variables from the full sample. The finite-sample performance of the proposed method is investigated in two limited simulation studies, including a partially simulated real data example from the Korean Labor and Income Panel Survey.Comment: 38 pages, 3 table

    Imputation estimators for unnormalized models with missing data

    Get PDF
    Several statistical models are given in the form of unnormalized densities, and calculation of the normalization constant is intractable. We propose estimation methods for such unnormalized models with missing data. The key concept is to combine imputation techniques with estimators for unnormalized models including noise contrastive estimation and score matching. In addition, we derive asymptotic distributions of the proposed estimators and construct confidence intervals. Simulation results with truncated Gaussian graphical models and the application to real data of wind direction reveal that the proposed methods effectively enable statistical inference with unnormalized models from missing data.Comment: To appear (AISTATS 2020

    Finite sample properties of multiple imputation estimators

    Full text link
    Finite sample properties of multiple imputation estimators under the linear regression model are studied. The exact bias of the multiple imputation variance estimator is presented. A method of reducing the bias is presented and simulation is used to make comparisons. We also show that the suggested method can be used for a general class of linear estimators

    Calibration estimation using exponential tilting in sample surveys

    Get PDF
    We consider the problem of parameter estimation with auxiliary information, where the auxiliary information takes the form of known moments. Calibration estimation is a typical example of using the moment conditions in sample surveys. Given the parametric form of the original distribution of the sample observations, we use the estimated importance sampling of Henmi, Yoshida and Eguchi (2007) to obtain an improved estimator. If we use the normal density to compute the importance weights, the resulting estimator takes the form of the one-step exponential tilting estimator. The proposed exponential tilting estimator is shown to be asymptotically equivalent to the regression estimator, but it avoids extreme weights and has some computational advantages over the empirical likelihood estimator. Variance estimation is also discussed and results from a limited simulation study are presented
    corecore