27,047 research outputs found

    Cross-validation in model-assisted estimation

    Get PDF
    Variance estimation for survey estimators that include modeling relies on approximations that ignore the effect of fitting the models. Cross-validation (CV) criterion provides a way to incorporate this effect. We will show 4 ways in which we explore this in this dissertation. Penalized spline regression, as a main type of nonparametric model assisted methods, is a common technique to improve the precision of finite population estimators. In Chapter 1, we propose a CV based criterion to select the smoothing parameter for the penalized spline regression estimator. The design-based asymptotic properties of the method are derived, and simulation studies show how well it works in practice. Regression estimator is a common technique to improve the precision of finite population estimators by using the available auxiliary information of the population. In Chapter 2, we propose a CV based variance estimator and compare it to other two variance estimators. The design-based asymptotic properties of the estimator are derived, and simulation studies show how well it works in practice. Regression estimator works well for the cases where there is a strong linear relationship between regressor and regressands. On the contrary, when the relationship is weak, π estimator is a good choice. In Chapter 3, a new estimator as a linear combination of those two estimators is proposed to select between them. We introduce a CV based variance estimator for the new proposed estimator. The design-based asymptotic properties of the estimator are explored, and simulation studies show how well it works in practice. In linear regression estimation, how to choose the set of control variables x is a difficult practical problem. In Chapter 4, a CV criterion is introduced for choosing between combinations of the x variables to be included in the model. The design-based asymptotic properties of the estimator are explored, and simulation studies show how well it works in practice

    Improving the estimation of the odds ratio in sampling surveys using auxiliary information

    Get PDF
    The odds-ratio measure is widely used in Health and Social surveys where the aim is to compare the odds of a certain event between a population at risk and a population not at risk. It can be defined using logistic regression through an estimating equation that allows a generalization to continuous risk variable. Data from surveys need to be analyzed in a proper way by taking into account the survey weights. Because the odds-ratio is a complex parameter, the analyst has to circumvent some difficulties when estimating confidence intervals. The present paper suggests a nonparametric approach that can take advantage of some auxiliary information in order to improve on the precision of the odds-ratio estimator. The approach consists in B-spline modelling which can handle the nonlinear structure of the parameter in a exible way and is easy to implement. The variance estimation issue is solved through a linearization approach and confidence intervals are derived. Two small applications are discussed

    Improving the estimation of the odds ratio in sampling surveys using auxiliary information

    Get PDF
    The odds-ratio measure is widely used in Health and Social surveys where the aim is to compare the odds of a certain event between a population at risk and a population not at risk. It can be defined using logistic regression through an estimating equation that allows a generalization to continuous risk variable. Data from surveys need to be analyzed in a proper way by taking into account the survey weights. Because the odds-ratio is a complex parameter, the analyst has to circumvent some difficulties when estimating confidence intervals. The present paper suggests a nonparametric approach that can take advantage of some auxiliary information in order to improve on the precision of the odds-ratio estimator. The approach consists in B-spline modelling which can handle the nonlinear structure of the parameter in a exible way and is easy to implement. The variance estimation issue is solved through a linearization approach and confidence intervals are derived. Two small applications are discussed

    General distance formula estimation of population total for unequal probability sampling designs with auxiliary variables

    Get PDF
    Sampling is a process or technique to obtain statistical information about a finite population by selecting a representative sample from that population, by using an appropriate sampling design. Furthermore, in the process, the required information about the units in the sample is measured and the inference about the unknown population parameters such as means, totals and proportions are done. This study is focused on estimating an unknown population total for one target variable using single or multiple auxiliary variables correlated with the target variable. This study also explores two classical estimators, namely the ratio estimator and the linear regression estimator, which are used as an alternative to the Horvitz Thompson estimator in the presence of a single auxiliary variable to estimate an unknown population total. The theoretical and empirical aspects were used to compare between these two estimators. The comparison was carried out based on the sample size and the correlation coefficient between the target variable and the auxiliary variable. The empirical study using the secondary data set for small and medium sample sizes shows that the linear regression estimator is more efficient compared to the ratio estimator when the correlation coefficient of the two variables is positive. For a large sample sizes, there are no significant differences between the two estimators. Also, the variance of both estimators decreases when the sample size increases. In contrast, if the correlation coefficient is negative, then any increase in the sample size leads to significant decrease in the variance estimate of the linear regression estimator. Meanwhile, for the ratio estimator, as the sample size is considerably increased, the variance of the estimator decreases. The simulation study showed that when the variable of interest has a strong negative correlation with the auxiliary variable irrespective of the sample size, the linear regression estimator provides an efficient estimate for the unknown population total relative to the ratio estimator. While, if the correlation coefficient between the variable of interest and the auxiliary variable is positive and within the range [0.75, 1], then the two estimators give a better estimate for the population total compared to the conventional estimators. However, the estimate of the total population obtained by the linear regression estimator is slightly more efficient than the ratio estimator. The most important idea in the estimation by using minimum distance measures is the quantification of the degree of closeness between the two data sets, such as sample data and the parametric distribution depends on an unknown parameter. A general distance formula is suggested in this research, based on the concept of the power divergence function, rather than that used by Deville and Särndal to measure the degree of closeness between the calibrated weights (new weights) and the classical design weights in Horvitz Thompson estimator. Derivation of the proposed general distance formula involved adding another constraint to the calibration equation constraints with respect to the sum of the classical sample design weights and the sum of sample calibrated weights. In order to generate a variety of distance measurements, the proposed formula was used to obtain a set of new weights that could be used to construct new estimators based on the inverse functions created by the proposed formula for estimating the total unknown population. Finally, the problems associated with calibrated weights produced by some distance measures, such as unrealistic or extreme weights are examined, leading to inaccurate estimates when these weights were handled instead of the design weights

    On Improvement in Estimating Population Parameter(s) Using Auxiliary Information

    Get PDF
    The purpose of writing this book is to suggest some improved estimators using auxiliary information in sampling schemes like simple random sampling and systematic sampling. This volume is a collection of five papers. The following problems have been discussed in the book: In chapter one an estimator in systematic sampling using auxiliary information is studied in the presence of non-response. In second chapter some improved estimators are suggested using auxiliary information. In third chapter some improved ratio-type estimators are suggested and their properties are studied under second order of approximation. In chapter four and five some estimators are proposed for estimating unknown population parameter(s) and their properties are studied. This book will be helpful for the researchers and students who are working in the field of finite population estimation.Comment: 63 pages, 8 tables. Educational Publishing & Journal of Matter Regularity (Beijing
    corecore