18 research outputs found
Approximate Bayesian approaches and semiparametric methods for handling missing data
This thesis consists of four research papers focusing on estimation and inference in missing data. In the first paper (Chapter 2), an approximate Bayesian approach is developed to handle unit nonresponse with parametric model assumptions on the response probability, but without model assumptions for the outcome variable. The proposed Bayesian method is also extended to incorporate the auxiliary information from full sample. In second paper (Chapter 3), a new Bayesian method using the Spike-and-Slab prior is proposed to handle the sparse propensity score estimation. The proposed method is not based on any model assumption on the outcome variable and is computationally efficient. In third paper (Chapter 4), we develop a robust semiparametric method based on the profile likelihood obtained from semiparametric response model. The proposed method uses the observed regression model and the semiparametric response model to achieve robustness. An efficient algorithm using fractional imputation is developed. The bootstrap testing procedure is also proposed to test ignorability assumption. In last paper (Chapter 5), we propose a novel semiparametric fractional imputation method using Gaussian mixture model for handling multivariate missingness. The proposed method is computationally efficient and leads to robust estimation. The proposed method is further extended to incorporate the categorical auxiliary information. Asymptotic properties are developed for each proposed methods. Both simulation studies and real data applications are conducted to check the performance of the proposed methods in this thesis
Bayesian Sparse Propensity Score Estimation for Unit Nonresponse
Nonresponse weighting adjustment using propensity score is a popular method
for handling unit nonresponse. However, including all available auxiliary
variables into the propensity model can lead to inefficient and inconsistent
estimation, especially with high-dimensional covariates. In this paper, a new
Bayesian method using the Spike-and-Slab prior is proposed for sparse
propensity score estimation. The proposed method is not based on any model
assumption on the outcome variable and is computationally efficient. Instead of
doing model selection and parameter estimation separately as in many
frequentist methods, the proposed method simultaneously selects the sparse
response probability model and provides consistent parameter estimation. Some
asymptotic properties of the proposed method are presented. The efficiency of
this sparse propensity score estimator is further improved by incorporating
related auxiliary variables from the full sample. The finite-sample performance
of the proposed method is investigated in two limited simulation studies,
including a partially simulated real data example from the Korean Labor and
Income Panel Survey.Comment: 38 pages, 3 table
An approximate Bayesian inference on propensity score estimation under unit nonresponse
Nonresponse weighting adjustment using the response propensity score is a popular tool for handling unit nonresponse. Statistical inference after the non- response weighting adjustment is complicated because the effect of estimating the propensity model parameter needs to be incorporated. In this paper, we propose an approximate Bayesian approach to handle unit nonresponse with parametric model assumptions on the response probability, but without model assumptions for the outcome variable. The proposed Bayesian method is cal- ibrated to the frequentist inference in that the credible region obtained from the posterior distribution asymptotically matches to the frequentist confidence interval obtained from the Taylor linearization method. Unlike the frequentist approach, however, the proposed method does not involve Taylor linearization. The proposed method can be extended to handle over-identified cases in which there are more estimating equations than the parameters. Besides, the proposed method can also be modified to handle nonignorable nonresponse. Results from two simulation studies confirm the validity of the proposed methods, which are then applied to data from a Korean longitudinal survey
Optimal Stratification and Allocation for the June Agricultural Survey
A computational approach to optimal multivariate designs with respect to stratification and allocation is investigated under the assumptions of fixed total allocation, known number of strata, and the availability of administrative data correlated with thevariables of interest under coefficient-of-variation constraints. This approach uses a penalized objective function that is optimized by simulated annealing through exchanging sampling units and sample allocations among strata. Computational speed is improved through the use of a computationally efficient machine learning method such as K-means to create an initial stratification close to the optimal stratification. The numeric stability of the algorithm has been investigated and parallel processing has been employed where appropriate. Results are presented for both simulated data and USDA’s June Agricultural Survey. An R package has also been made available for evaluation
Semiparametric fractional imputation using Gaussian mixture models for handling multivariate missing data
Item nonresponse is frequently encountered in practice. Ignoring missing data can lose efficiency and lead to misleading inference. Fractional imputation is a frequentist approach of imputation for handling missing data. However, the parametric fractional imputation of Kim (2011) may be subject to bias under model misspecification. In this paper, we propose a novel semiparametric fractional imputation method using Gaussian mixture models. The proposed method is computationally efficient and leads to robust estimation. The proposed method is further extended to incorporate the categorical auxiliary information. The asymptotic model consistency and √n- consistency of the semiparametric fractional imputation estimator are also established. Some simulation studies are presented to check the finite sample performance of the proposed method