852 research outputs found

    A nonparametric model-based estimator for the cumulative distribution function of a right censored variable in a finite population

    Get PDF
    In survey analysis, the estimation of the cumulative distribution function (cdf) is of great interest: it allows for instance to derive quantiles estimators or other non linear parameters derived from the cdf. We consider the case where the response variable is a right censored duration variable. In this framework, the classical estimator of the cdf is the Kaplan-Meier estimator. As an alternative, we propose a nonparametric model-based estimator of the cdf in a finite population. The new estimator uses auxiliary information brought by a continuous covariate and is based on nonparametric median regression adapted to the censored case. The bias and variance of the prediction error of the estimator are estimated by a bootstrap procedure adapted to censoring. The new estimator is compared by model-based simulations to the Kaplan-Meier estimator computed with the sampled individuals: a significant gain in precision is brought by the new method whatever the size of the sample and the censoring rate. Welfare duration data are used to illustrate the new methodology.Comment: 18 pages, 5 figure

    The optimization problem of quantile and poverty measures estimation based on calibration

    Get PDF
    New calibrated estimators of quantiles and poverty measures are proposed. These estimators combine the incorporation of auxiliary information provided by auxiliary variables related to the variable of interest by calibration techniques with the selection of optimal calibration points under simple random sampling without replacement. The problem of selecting calibration points that minimize the asymptotic variance of the quantile estimator is addressed. Once the problem is solved, the definition of the new quantile estimator requires that the optimal estimator of the distribution function on which it is based verifies the properties of the distribution function. Through a theorem, the nondecreasing monotony property for the optimal estimator of the distribution function is established and the corresponding optimal estimator can be defined. This optimal quantile estimator is also used to define new estimators for poverty measures. Simulation studies with real data from the Spanish living conditions survey compares the performance of the new estimators against various methods proposed previously, where some resampling techniques are used for the variance estimation. Based on the results of the simulation study, the proposed estimators show a good performance and are a reasonable alternative to other estimators.Ministerio de Educacion y Cienci

    Quantile estimation using auxiliary information with applications to soil texture data

    Get PDF
    In the Major Land Resource Area (MLRA) 107 pilot project, a multi-phase probability sampling design for updating soil surveys was implemented in western Iowa. In general, multi-phase designs are used when a variable of interest is expensive to measure, but is strongly related to another (auxiliary) variable which is inexpensive to observe. In a multi-phase design, the auxiliary variable is observed for a sample and the study variable is observed for a relatively small sub-sample. In the estimation stage, the auxiliary information is used to improve estimators of distributional quantities relating to the study variable. In particular, we consider estimation of quantiles in this context;Chambers and Dunstan (1986) (CD) presented an estimator for a finite population distribution function which incorporates auxiliary information. A linear relationship between the study variable and the auxiliary information is assumed. The residuals in the linear model are assumed to be homoskedastic. We derive a Bahadur-like representation for the quantile estimator corresponding to the CD distribution function estimator. This expression is used to derive an expression for the asymptotic variance of the quantile estimator;We consider estimation of quantiles for soil texture profiles using data from the MLRA 107 pilot project. The laboratory determination of soil texture is the variable of interest. Auxiliary information is available in the form of field determinations of soil texture. Due to the multi-phase sampling design used for data collection, field determinations are available at more sites than laboratory determinations. The CD quantile estimator is modified to incorporate sampling weights and to allow heteroskedasticity in the assumed linear model;A Bayesian approach to this estimation problem is also considered. A hierarchical model is used to describe the relationships between observed data and unknown parameters. Soil horizon profiles are modeled as realizations of Markov chains. Transformed textures are modeled with Gaussian mixtures. The posterior distribution of soil texture profiles is numerically approximated using a Gibbs sampler. The hierarchical model provides a comprehensive framework which may be useful for analyzing other variables collected in the pilot project. The two approaches are compared using simulated and real data

    TREATMENT OF INFLUENTIAL OBSERVATIONS IN THE CURRENT EMPLOYMENT STATISTICS SURVEY

    Get PDF
    It is common for many establishment surveys that a sample contains a fraction of observations that may seriously affect survey estimates. Influential observations may appear in the sample due to imperfections of the survey design that cannot fully account for the dynamic and heterogeneous nature of the population of businesses. An observation may become influential due to a relatively large survey weight, extreme value, or combination of the weight and value. We propose a Winsorized estimator with a choice of cutoff points that guarantees that the resulting mean squared error is lower than the variance of the original survey weighted estimator. This estimator is based on very un-restrictive modeling assumptions and can be safely used when the sample is sufficiently large. We consider a different approach when the sample is small. Estimation from small samples generally relies on strict model assumptions. Robustness here is understood as insensitivity of an estimator to model misspecification or to appearance of outliers. The proposed approach is a slight modification of the classical linear mixed model application to small area estimation. The underlying distribution of the random error term is a scale mixture of two normal distributions. This setup can describe outliers in individual observations. It is also suitable for a more general situation where units from two distinct populations are put together for estimation. The mixture group indicator is not observed. The probabilities of observations coming from a group with a smaller or larger variance are estimated from the data. These conditional probabilities can serve as the basis for a formal test on outlyingness at the area level. Simulations are carried out to compare several alternative estimators under different scenarios. Performance of the bootstrap method for prediction confidence intervals is investigated using simulations. We also compare the proposed method with alternative existing methods in a study using data from the Current Employment Statistics Survey conducted by the U.S. Bureau of Labor Statistics

    Generalized mixture estimators for the finite population mean

    Get PDF
    The first order approximation of the theoretical mean square error and assumption of bivariate normality are very often used for the ratio type estimators for the population mean and variance. We have examined the adequacy of the first order approximation and the robustness of various ratio type estimators. We observed that the first order approximation for ratio type mean estimators and ratio type variance estimators works well if the sampling fraction is small and that departure from the assumption of bivariate normality is not a problem for large samples. We have also proposed some generalized mixture estimators which are combinations of the commonly used estimators. We have also extended the proposed generalized mixture estimators to the case when the study variable is sensitive and a non sensitive auxiliary variable is available. We have shown that the proposed generalized mixture estimators are more efficient than other commonly used estimators. An extensive simulation study and numerical examples are also presented

    Small area model-based estimators using big data sources

    Get PDF
    The timely, accurate monitoring of social indicators, such as poverty or inequality, on a fine- grained spatial and temporal scale is a crucial tool for understanding social phenomena and policymaking, but poses a great challenge to official statistics. This article argues that an interdisciplinary approach, combining the body of statistical research in small area estimation with the body of research in social data mining based on Big Data, can provide novel means to tackle this problem successfully. Big Data derived from the digital crumbs that humans leave behind in their daily activities are in fact providing ever more accurate proxies of social life. Social data mining from these data, coupled with advanced model-based techniques for fine-grained estimates, have the potential to provide a novel microscope through which to view and understand social complexity. This article suggests three ways to use Big Data together with small area estimation techniques, and shows how Big Data has the potential to mirror aspects of well-being and other socioeconomic phenomena
    corecore