3,369 research outputs found
Integration of survey data and big observational data for finite population inference using mass imputation
Multiple data sources are becoming increasingly available for statistical
analyses in the era of big data. As an important example in finite-population
inference, we consider an imputation approach to combining a probability sample
with big observational data. Unlike the usual imputation for missing data
analysis, we create imputed values for the whole elements in the probability
sample. Such mass imputation is attractive in the context of survey data
integration (Kim and Rao, 2012). We extend mass imputation as a tool for data
integration of survey data and big non-survey data. The mass imputation methods
and their statistical properties are presented. The matching estimator of
Rivers (2007) is also covered as a special case. Variance estimation with
mass-imputed data is discussed. The simulation results demonstrate the proposed
estimators outperform existing competitors in terms of robustness and
efficiency
Semiparametric response model with nonignorable nonresponse
How to deal with nonignorable response is often a challenging problem
encountered in statistical analysis with missing data. Parametric model
assumption for the response mechanism is often made and there is no way to
validate the model assumption with missing data. We consider a semiparametric
response model that relaxes the parametric model assumption in the response
mechanism. Two types of efficient estimators, profile maximum likelihood
estimator and profile calibration estimator, are proposed and their asymptotic
properties are investigated. Two extensive simulation studies are used to
compare with some existing methods. We present an application of our method
using Korean Labor and Income Panel Survey data
Fractional Imputation in Survey Sampling: A Comparative Review
Fractional imputation (FI) is a relatively new method of imputation for
handling item nonresponse in survey sampling. In FI, several imputed values
with their fractional weights are created for each missing item. Each
fractional weight represents the conditional probability of the imputed value
given the observed data, and the parameters in the conditional probabilities
are often computed by an iterative method such as EM algorithm. The underlying
model for FI can be fully parametric, semiparametric, or nonparametric,
depending on plausibility of assumptions and the data structure.
In this paper, we give an overview of FI, introduce key ideas and methods to
readers who are new to the FI literature, and highlight some new development.
We also provide guidance on practical implementation of FI and valid
inferential tools after imputation. We demonstrate the empirical performance of
FI with respect to multiple imputation using a pseudo finite population
generated from a sample in Monthly Retail Trade Survey in US Census Bureau.Comment: 26 pages, 2 figure
Predictive mean matching imputation in survey sampling
Predictive mean matching imputation is popular for handling item nonresponse
in survey sampling. In this article, we study the asymptotic properties of the
predictive mean matching estimator of the population mean. For variance
estimation, the conventional bootstrap inference for matching estimators with
fixed matches has been shown to be invalid due to the nonsmoothness nature of
the matching estimator. We propose asymptotically valid replication variance
estimation. The key strategy is to construct replicates of the estimator
directly based on linear terms, instead of individual records of variables.
Extension to nearest neighbor imputation is also discussed. A simulation study
confirms that the new procedure provides valid variance estimation.Comment: 20 pages, 0 figure, 1 tabl
A note on multiple imputation for method of moments estimation
Multiple imputation is a popular imputation method for general purpose
estimation. Rubin(1987) provided an easily applicable formula for the variance
estimation of multiple imputation. However, the validity of the multiple
imputation inference requires the congeniality condition of Meng(1994), which
is not necessarily satisfied for method of moments estimation. This paper
presents the asymptotic bias of Rubin's variance estimator when the method of
moments estimator is used as a complete-sample estimator in the multiple
imputation procedure. A new variance estimator based on over-imputation is
proposed to provide asymptotically valid inference for method of moments
estimation.Comment: 8 pages, 0 figur
Bayesian Sparse Propensity Score Estimation for Unit Nonresponse
Nonresponse weighting adjustment using propensity score is a popular method
for handling unit nonresponse. However, including all available auxiliary
variables into the propensity model can lead to inefficient and inconsistent
estimation, especially with high-dimensional covariates. In this paper, a new
Bayesian method using the Spike-and-Slab prior is proposed for sparse
propensity score estimation. The proposed method is not based on any model
assumption on the outcome variable and is computationally efficient. Instead of
doing model selection and parameter estimation separately as in many
frequentist methods, the proposed method simultaneously selects the sparse
response probability model and provides consistent parameter estimation. Some
asymptotic properties of the proposed method are presented. The efficiency of
this sparse propensity score estimator is further improved by incorporating
related auxiliary variables from the full sample. The finite-sample performance
of the proposed method is investigated in two limited simulation studies,
including a partially simulated real data example from the Korean Labor and
Income Panel Survey.Comment: 38 pages, 3 table
Imputation estimators for unnormalized models with missing data
Several statistical models are given in the form of unnormalized densities,
and calculation of the normalization constant is intractable. We propose
estimation methods for such unnormalized models with missing data. The key
concept is to combine imputation techniques with estimators for unnormalized
models including noise contrastive estimation and score matching. In addition,
we derive asymptotic distributions of the proposed estimators and construct
confidence intervals. Simulation results with truncated Gaussian graphical
models and the application to real data of wind direction reveal that the
proposed methods effectively enable statistical inference with unnormalized
models from missing data.Comment: To appear (AISTATS 2020
Finite sample properties of multiple imputation estimators
Finite sample properties of multiple imputation estimators under the linear
regression model are studied. The exact bias of the multiple imputation
variance estimator is presented. A method of reducing the bias is presented and
simulation is used to make comparisons. We also show that the suggested method
can be used for a general class of linear estimators
Calibration estimation using exponential tilting in sample surveys
We consider the problem of parameter estimation with auxiliary information, where the auxiliary information takes the form of known moments. Calibration estimation is a typical example of using the moment conditions in sample surveys. Given the parametric form of the original distribution of the sample observations, we use the estimated importance sampling of Henmi, Yoshida and Eguchi (2007) to obtain an improved estimator. If we use the normal density to compute the importance weights, the resulting estimator takes the form of the one-step exponential tilting estimator. The proposed exponential tilting estimator is shown to be asymptotically equivalent to the regression estimator, but it avoids extreme weights and has some computational advantages over the empirical likelihood estimator. Variance estimation is also discussed and results from a limited simulation study are presented
- …