21 research outputs found

    Variance estimation after Kernel Ridge Regression Imputation

    Get PDF
    Imputation is a popular technique for handling missing data. Variance estimation after imputation is an important practical problem in statistics. In this paper, we consider variance estimation of the imputed mean estimator under the kernel ridge regression imputation. We consider a linearization approach which employs the covariate balancing idea to estimate the inverse of propensity scores. The statistical guarantee of our proposed variance estimation is studied when a Sobolev space is utilized to do the imputation, where n-consistency can be obtained. Synthetic data experiments are presented to confirm our theory

    Triply robust estimation under missing at random

    Full text link
    Missing data is frequently encountered in many areas of statistics. Imputation and propensity score weighting are two popular methods for handling missing data. These methods employ some model assumptions, either the outcome regression or the response propensity model. However, correct specification of the statistical model can be challenging in the presence of missing data. Doubly robust estimation is attractive as the consistency of the estimator is guaranteed when either the outcome regression model or the propensity score model is correctly specified. In this paper, we first employ information projection to develop an efficient and doubly robust estimator under indirect model calibration constraints. The resulting propensity score estimator can be equivalently expressed as a doubly robust regression imputation estimator by imposing the internal bias calibration condition in estimating the regression parameters. In addition, we generalize the information projection to allow for outlier-robust estimation. Thus, we achieve triply robust estimation by adding the outlier robustness condition to the double robustness condition. Some asymptotic properties are presented. The simulation study confirms that the proposed method allows robust inference against not only the violation of various model assumptions, but also outliers

    Fertile Island Effect by Three Typical Woody Plants on Wetlands of Ebinur Lake, northwestern China

    Get PDF
    Desertification poses a permanent threat to the security of arid ecosystems. Perennial arid vegetation plays a crucial role in maintaining the structure and function of arid ecosystems and slowing the process of desertification by forming “fertile islands” under the tree canopy. However, the process of formation and development of these fertile islands remains uncertain. Here, we explored how three typical woody plants (i.e., Populus euphratica, Haloxylon ammodendron, and Nitraria tangutorum) in the Ebinur Lake Basin of northwestern China differed in their soil nitrogen and phosphorus. 1) Significant differences of organic carbon and total and available nitrogen/phosphorus were observed in the soil among the three typical woody plant-dominated ecosystems. Populus euphratica had significant differences of N and P contents between the canopy and bare soils, except for ammonium nitrogen. 2) Our RDA analysis revealed that the major factors that influenced the soil nutrient differences among the three vegetations were plant crown width, soil water content, salinity, and pH. 3) The organic carbon content of bare soil was significantly correlated with N and P in all the three vegetations. This study contributes to our understanding of the factors that influence the fertile island effect in arid ecosystems, which may contribute to soil conservation in arid areas

    Prediction of spatial distribution characteristics of ecosystem functions based on a minimum data set of functional traits of desert plants

    Get PDF
    The relationship between plant functional traits and ecosystem function is a hot topic in current ecological research, and community-level traits based on individual plant functional traits play important roles in ecosystem function. In temperate desert ecosystems, which functional trait to use to predict ecosystem function is an important scientific question. In this study, the minimum data sets of functional traits of woody (wMDS) and herbaceous (hMDS) plants were constructed and used to predict the spatial distribution of C, N, and P cycling in ecosystems. The results showed that the wMDS included plant height, specific leaf area, leaf dry weight, leaf water content, diameter at breast height (DBH), leaf width, and leaf thickness, and the hMDS included plant height, specific leaf area, leaf fresh weight, leaf length, and leaf width. The linear regression results based on the cross-validations (FTEIW - L, FTEIA - L, FTEIW - NL, and FTEIA - NL) for the MDS and TDS (total data set) showed that the R2 (coefficients of determination) for wMDS were 0.29, 0.34, 0.75, and 0.57, respectively, and those for hMDS were 0.82, 0.75, 0.76, and 0.68, respectively, proving that the MDSs can replace the TDS in predicting ecosystem function. Then, the MDSs were used to predict the C, N, and P cycling in the ecosystem. The results showed that non-linear models RF and BPNN were able to predict the spatial distributions of C, N and P cycling, and the distributions showed inconsistent patterns between different life forms under moisture restrictions. The C, N, and P cycling showed strong spatial autocorrelation and were mainly influenced by structural factors. Based on the non-linear models, the MDSs can be used to accurately predict the C, N, and P cycling, and the predicted values of woody plant functional traits visualized by regression kriging were closer to the kriging results based on raw values. This study provides a new perspective for exploring the relationship between biodiversity and ecosystem function

    Topics on nonparametric calibration, kernel ridge regression imputation\\ and nonparametric propensity score estimation

    No full text
    This dissertation focuses on statistical issues arising in survey data and item nonresponse. In particular, it covers topics on nonparametric calibration in survey data, kernel ridge regression imputation and density ratio estimation in propensity score approach. The first project is about nonparametric calibration in survey sampling. Estimation of a finite population mean or total is important in survey sampling. Calibration estimation is a popular method to address this issue by adjusting the sampling weights to match the unknown population totals of auxiliary variables. When the auxiliary vairbales are observed for all units in the finite population, one can apply the model calibration using the working outcome model. Traditional parametric calibration approach might not be robust in practice. We develope a nonparametric calibration method employing infinite-dimensional reproducing kernel Hilbert space (RKHS) that does not require an explicit outcome model. Under mild assumptions, the proposed calibration estimator attains the Godambe-Joshi lower bound asymptotically. The second project is about handling missing data using kernel ridge regression method. Missing data is frequently encountered in practice. In some cases, missingness is planned to reduce the cost or the response burden. Ignoring the cases with missing values can lead to misleading results. To avoid the potential problem with missing data, imputation is commonly used. Kernel Ridge Regression (KRR) is a modern nonparametric regression technique based on the theory of Reproducing Kernel Hilbert Space, which enjoys the model robustness. We consider such method to imputation. Specifically, we establish the root-n consistency of the KRR imputation estimators and show that it is optimal in the sense that it achieves the lower bound of the semiparametric asymptotic variance. We further consider propensity score weighting method using kernel ridge regression and discuss its asymptotic properties. The third project is about propensity score estimation using density ration function approach. The propensity score approach is also a popular tool for handling item nonresponse. The propensity score is often developed using the model for the response probability. In practice, regression models for binary response, e.g., logistic regression, can be utilized to model the response probability given the observed auxiliary information. An inverse probability weighting estimator can then be constructed to get an unbiased estimation of the target parameter. We consider an alternative approach of estimating the inverse of the propensity scores using density ratio function. Density ratio estimation can be obtained by applying the maximum entropy method which uses the Kullback-Leibler distance measure. By including the covariates for the outcome regression models only into the density ratio model, we can achieve efficient propensity score estimation. We further extend the proposed approach to handling the multivariate missing case

    Topics on nonparametric calibration, kernel ridge regression imputation\\ and nonparametric propensity score estimation

    Get PDF
    This dissertation focuses on statistical issues arising in survey data and item nonresponse. In particular, it covers topics on nonparametric calibration in survey data, kernel ridge regression imputation and density ratio estimation in propensity score approach. The first project is about nonparametric calibration in survey sampling. Estimation of a finite population mean or total is important in survey sampling. Calibration estimation is a popular method to address this issue by adjusting the sampling weights to match the unknown population totals of auxiliary variables. When the auxiliary vairbales are observed for all units in the finite population, one can apply the model calibration using the working outcome model. Traditional parametric calibration approach might not be robust in practice. We develope a nonparametric calibration method employing infinite-dimensional reproducing kernel Hilbert space (RKHS) that does not require an explicit outcome model. Under mild assumptions, the proposed calibration estimator attains the Godambe-Joshi lower bound asymptotically. The second project is about handling missing data using kernel ridge regression method. Missing data is frequently encountered in practice. In some cases, missingness is planned to reduce the cost or the response burden. Ignoring the cases with missing values can lead to misleading results. To avoid the potential problem with missing data, imputation is commonly used. Kernel Ridge Regression (KRR) is a modern nonparametric regression technique based on the theory of Reproducing Kernel Hilbert Space, which enjoys the model robustness. We consider such method to imputation. Specifically, we establish the root-n consistency of the KRR imputation estimators and show that it is optimal in the sense that it achieves the lower bound of the semiparametric asymptotic variance. We further consider propensity score weighting method using kernel ridge regression and discuss its asymptotic properties. The third project is about propensity score estimation using density ration function approach. The propensity score approach is also a popular tool for handling item nonresponse. The propensity score is often developed using the model for the response probability. In practice, regression models for binary response, e.g., logistic regression, can be utilized to model the response probability given the observed auxiliary information. An inverse probability weighting estimator can then be constructed to get an unbiased estimation of the target parameter. We consider an alternative approach of estimating the inverse of the propensity scores using density ratio function. Density ratio estimation can be obtained by applying the maximum entropy method which uses the Kullback-Leibler distance measure. By including the covariates for the outcome regression models only into the density ratio model, we can achieve efficient propensity score estimation. We further extend the proposed approach to handling the multivariate missing case.</p

    Statistical inference using Regularized M-estimation in the reproducing kernel Hilbert space for handling missing data

    No full text
    Imputation and propensity score weighting are two popular techniques for handling missing data. We address these problems using the regularized M-estimation techniques in the reproducing kernel Hilbert space. Specifically, we first use the kernel ridge regression to develop imputation for handling item nonresponse. While this nonparametric approach is potentially promising for imputation, its statistical properties are not investigated in the literature. Under some conditions on the order of the tuning parameter, we first establish the root-n consistency of the kernel ridge regression imputation estimator and show that it achieves the lower bound of the semiparametric asymptotic variance. A nonparametric propensity score estimator using the reproducing kernel Hilbert space is also developed by a novel application of the maximum entropy method for the density ratio function estimation. We show that the resulting propensity score estimator is asymptotically equivalent to the kernel ridge regression imputation estimator. Results from a limited simulation study are also presented to confirm our theory. The proposed method is applied to analyze the air pollution data measured in Beijing, China.This is a pre-print of the article Wang, Hengfang, and Jae Kwang Kim. "Statistical inference using Regularized M-estimation in the reproducing kernel Hilbert space for handling missing data." arXiv preprint arXiv:2107.07371 (2021). DOI: 10.48550/arXiv.2107.07371. Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0). Copyright 2021 The Authors. Posted with permission

    Statistical inference using Regularized M-estimation in the reproducing kernel Hilbert space for handling missing data

    No full text
    Imputation and propensity score weighting are two popular techniques for handling missing data. We address these problems using the regularized M-estimation techniques in the reproducing kernel Hilbert space. Specifically, we first use the kernel ridge regression to develop imputation for handling item nonresponse. While this nonparametric approach is potentially promising for imputation, its statistical properties are not investigated in the literature. Under some conditions on the order of the tuning parameter, we first establish the root-n consistency of the kernel ridge regression imputation estimator and show that it achieves the lower bound of the semiparametric asymptotic variance. A nonparametric propensity score estimator using the reproducing kernel Hilbert space is also developed by a novel application of the maximum entropy method for the density ratio function estimation. We show that the resulting propensity score estimator is asymptotically equivalent to the kernel ridge regression imputation estimator. Results from a limited simulation study are also presented to confirm our theory. The proposed method is applied to analyze the air pollution data measured in Beijing, China.This is a pre-print of the article Wang, Hengfang, and Jae Kwang Kim. "Statistical inference using Regularized M-estimation in the reproducing kernel Hilbert space for handling missing data." arXiv preprint arXiv:2107.07371 (2021). DOI: 10.48550/arXiv.2107.07371. Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0). Copyright 2021 The Authors. Posted with permission
    corecore