3,194 research outputs found

    Optimal dimension and optimal auxiliary vector to construct calibration estimators of the distribution function

    Get PDF
    The calibration technique (Deville and Särndal, 1992) to estimate the finite distribution function has been studied in several papers. Calibration seeks for new weights close enough to sampling weights according to some distance function and that, at the same time, match benchmark constraints on available auxiliary information. The non smooth character of the finite population distribution function causes certain complexities that are resolved by different authors in different ways. One of these is to have consistency at a number of arbitrarily chosen points. This paper deals with the problem of the optimal selection of the number of points and with the optimal selections of these points, when auxiliary information is used by means of calibration.Ministerio de Educación y CienciaConsejería de Economía, Innovación, Ciencia y Emple

    Calibration estimator for Head Count Index

    Get PDF
    This paper considers the problem of estimating a poverty measure, the Head Count Index, using the auxiliary information available, which is incorporated into the estimation procedure by calibration techniques. The proposed method does not directly use the auxiliary information provided by auxiliary variables related to the variable of interest in the calibration process, but the auxiliary information, after a transformation, is incorporated by calibration techniques applied to the distribution function of the study variable. Monte Carlo experiments were carried out for simulated data and for real data taken from the Spanish living conditions survey to explore the performance of the new estimation methods of the Head Count Index

    Combining multiple observational data sources to estimate causal effects

    Full text link
    The era of big data has witnessed an increasing availability of multiple data sources for statistical analyses. We consider estimation of causal effects combining big main data with unmeasured confounders and smaller validation data with supplementary information on these confounders. Under the unconfoundedness assumption with completely observed confounders, the smaller validation data allow for constructing consistent estimators for causal effects, but the big main data can only give error-prone estimators in general. However, by leveraging the information in the big main data in a principled way, we can improve the estimation efficiencies yet preserve the consistencies of the initial estimators based solely on the validation data. Our framework applies to asymptotically normal estimators, including the commonly-used regression imputation, weighting, and matching estimators, and does not require a correct specification of the model relating the unmeasured confounders to the observed variables. We also propose appropriate bootstrap procedures, which makes our method straightforward to implement using software routines for existing estimators

    Reduction of optimal calibration dimension with a new optimal auxiliary vector for calibrated estimators of the distribution function

    Get PDF
    The calibration method has been widely used to incorporate auxiliary information in the estimation of various parameters. Specifically, adapted this method to estimate the distribution function, although their proposal is computationally simple, its efficiency depends on the selection of an auxiliary vector of points. This work deals with the problem of selecting the calibration auxiliary vector that minimize the asymptotic variance of the calibration estimator of distribution function. The optimal dimension of the optimal auxiliary vector is reduced considerably with respect to previous studies so that with a smaller set of points the minimum of the asymptotic variance can be reached, which in turn allows to improve the efficiency of the estimates

    Empirical likelihood confidence intervals for complex sampling designs

    No full text
    We define an empirical likelihood approach which gives consistent design-based confidence intervals which can be calculated without the need of variance estimates, design effects, resampling, joint inclusion probabilities and linearization, even when the point estimator is not linear. It can be used to construct confidence intervals for a large class of sampling designs and estimators which are solutions of estimating equations. It can be used for means, regressions coefficients, quantiles, totals or counts even when the population size is unknown. It can be used with large sampling fractions and naturally includes calibration constraints. It can be viewed as an extension of the empirical likelihood approach to complex survey data. This approach is computationally simpler than the pseudoempirical likelihood and the bootstrap approaches. The simulation study shows that the confidence interval proposed may give better coverages than the confidence intervals based on linearization, bootstrap and pseudoempirical likelihood. Our simulation study shows that, under complex sampling designs, standard confidence intervals based on normality may have poor coverages, because point estimators may not follow a normal sampling distribution and their variance estimators may be biased.<br/

    Second-Order Inference for the Mean of a Variable Missing at Random

    Get PDF
    We present a second-order estimator of the mean of a variable subject to missingness, under the missing at random assumption. The estimator improves upon existing methods by using an approximate second-order expansion of the parameter functional, in addition to the first-order expansion employed by standard doubly robust methods. This results in weaker assumptions about the convergence rates necessary to establish consistency, local efficiency, and asymptotic linearity. The general estimation strategy is developed under the targeted minimum loss-based estimation (TMLE) framework. We present a simulation comparing the sensitivity of the first and second order estimators to the convergence rate of the initial estimators of the outcome regression and missingness score. In our simulation, the second-order TMLE improved the coverage probability of a confidence interval by up to 85%. In addition, we present a first-order estimator inspired by a second-order expansion of the parameter functional. This estimator only requires one-dimensional smoothing, whereas implementation of the second-order TMLE generally requires kernel smoothing on the covariate space. The first-order estimator proposed is expected to have improved finite sample performance compared to existing first-order estimators. In our simulations, the proposed first-order estimator improved the coverage probability by up to 90%. We provide an illustration of our methods using a publicly available dataset to determine the effect of an anticoagulant on health outcomes of patients undergoing percutaneous coronary intervention. We provide R code implementing the proposed estimator

    The optimization problem of quantile and poverty measures estimation based on calibration

    Get PDF
    New calibrated estimators of quantiles and poverty measures are proposed. These estimators combine the incorporation of auxiliary information provided by auxiliary variables related to the variable of interest by calibration techniques with the selection of optimal calibration points under simple random sampling without replacement. The problem of selecting calibration points that minimize the asymptotic variance of the quantile estimator is addressed. Once the problem is solved, the definition of the new quantile estimator requires that the optimal estimator of the distribution function on which it is based verifies the properties of the distribution function. Through a theorem, the nondecreasing monotony property for the optimal estimator of the distribution function is established and the corresponding optimal estimator can be defined. This optimal quantile estimator is also used to define new estimators for poverty measures. Simulation studies with real data from the Spanish living conditions survey compares the performance of the new estimators against various methods proposed previously, where some resampling techniques are used for the variance estimation. Based on the results of the simulation study, the proposed estimators show a good performance and are a reasonable alternative to other estimators.Ministerio de Educacion y Cienci

    Bandwidth selection in kernel empirical risk minimization via the gradient

    Get PDF
    In this paper, we deal with the data-driven selection of multidimensional and possibly anisotropic bandwidths in the general framework of kernel empirical risk minimization. We propose a universal selection rule, which leads to optimal adaptive results in a large variety of statistical models such as nonparametric robust regression and statistical learning with errors in variables. These results are stated in the context of smooth loss functions, where the gradient of the risk appears as a good criterion to measure the performance of our estimators. The selection rule consists of a comparison of gradient empirical risks. It can be viewed as a nontrivial improvement of the so-called Goldenshluger-Lepski method to nonlinear estimators. Furthermore, one main advantage of our selection rule is the nondependency on the Hessian matrix of the risk, usually involved in standard adaptive procedures.Comment: Published at http://dx.doi.org/10.1214/15-AOS1318 in the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org
    corecore