6,623 research outputs found
Imputation Estimators Partially Correct for Model Misspecification
Inference problems with incomplete observations often aim at estimating
population properties of unobserved quantities. One simple way to accomplish
this estimation is to impute the unobserved quantities of interest at the
individual level and then take an empirical average of the imputed values. We
show that this simple imputation estimator can provide partial protection
against model misspecification. We illustrate imputation estimators' robustness
to model specification on three examples: mixture model-based clustering,
estimation of genotype frequencies in population genetics, and estimation of
Markovian evolutionary distances. In the final example, using a representative
model misspecification, we demonstrate that in non-degenerate cases, the
imputation estimator dominates the plug-in estimate asymptotically. We conclude
by outlining a Bayesian implementation of the imputation-based estimation.Comment: major rewrite, beta-binomial example removed, model based clustering
is added to the mixture model example, Bayesian approach is now illustrated
with the genetics exampl
Weak consistency of the 1-nearest neighbor measure with applications to missing data
When data is partially missing at random, imputation and importance weighting
are often used to estimate moments of the unobserved population. In this paper,
we study 1-nearest neighbor (1NN) importance weighting, which estimates moments
by replacing missing data with the complete data that is the nearest neighbor
in the non-missing covariate space. We define an empirical measure, the 1NN
measure, and show that it is weakly consistent for the measure of the missing
data. The main idea behind this result is that the 1NN measure is performing
inverse probability weighting in the limit. We study applications to missing
data and mitigating the impact of covariate shift in prediction tasks
- …