3,194 research outputs found
Optimal dimension and optimal auxiliary vector to construct calibration estimators of the distribution function
The calibration technique (Deville and Särndal, 1992) to estimate the finite distribution function has been studied in several papers. Calibration seeks for new weights close enough to sampling weights according to some distance function and that, at the same time, match benchmark constraints on available auxiliary information. The non smooth character of the finite population distribution function causes certain complexities that are resolved by different authors in different ways. One of these is to have consistency at a number of arbitrarily chosen points. This paper deals with the problem of the optimal selection of the number of points and with the optimal selections of these points, when auxiliary information is used by means of calibration.Ministerio de Educación y CienciaConsejería de Economía, Innovación, Ciencia y Emple
Calibration estimator for Head Count Index
This paper considers the problem of estimating a poverty measure, the Head Count
Index, using the auxiliary information available, which is incorporated into the estimation
procedure by calibration techniques. The proposed method does not directly
use the auxiliary information provided by auxiliary variables related to the variable
of interest in the calibration process, but the auxiliary information, after a transformation,
is incorporated by calibration techniques applied to the distribution function
of the study variable. Monte Carlo experiments were carried out for simulated data
and for real data taken from the Spanish living conditions survey to explore the
performance of the new estimation methods of the Head Count Index
Combining multiple observational data sources to estimate causal effects
The era of big data has witnessed an increasing availability of multiple data
sources for statistical analyses. We consider estimation of causal effects
combining big main data with unmeasured confounders and smaller validation data
with supplementary information on these confounders. Under the unconfoundedness
assumption with completely observed confounders, the smaller validation data
allow for constructing consistent estimators for causal effects, but the big
main data can only give error-prone estimators in general. However, by
leveraging the information in the big main data in a principled way, we can
improve the estimation efficiencies yet preserve the consistencies of the
initial estimators based solely on the validation data. Our framework applies
to asymptotically normal estimators, including the commonly-used regression
imputation, weighting, and matching estimators, and does not require a correct
specification of the model relating the unmeasured confounders to the observed
variables. We also propose appropriate bootstrap procedures, which makes our
method straightforward to implement using software routines for existing
estimators
Reduction of optimal calibration dimension with a new optimal auxiliary vector for calibrated estimators of the distribution function
The calibration method has been widely used to incorporate auxiliary information
in the estimation of various parameters. Specifically, adapted this method to estimate
the distribution function, although their proposal is computationally simple,
its efficiency depends on the selection of an auxiliary vector of points. This work
deals with the problem of selecting the calibration auxiliary vector that minimize the
asymptotic variance of the calibration estimator of distribution function. The optimal
dimension of the optimal auxiliary vector is reduced considerably with respect
to previous studies so that with a smaller set of points the minimum of the asymptotic
variance can be reached, which in turn allows to improve the efficiency of the
estimates
Empirical likelihood confidence intervals for complex sampling designs
We define an empirical likelihood approach which gives consistent design-based confidence intervals which can be calculated without the need of variance estimates, design effects, resampling, joint inclusion probabilities and linearization, even when the point estimator is not linear. It can be used to construct confidence intervals for a large class of sampling designs and estimators which are solutions of estimating equations. It can be used for means, regressions coefficients, quantiles, totals or counts even when the population size is unknown. It can be used with large sampling fractions and naturally includes calibration constraints. It can be viewed as an extension of the empirical likelihood approach to complex survey data. This approach is computationally simpler than the pseudoempirical likelihood and the bootstrap approaches. The simulation study shows that the confidence interval proposed may give better coverages than the confidence intervals based on linearization, bootstrap and pseudoempirical likelihood. Our simulation study shows that, under complex sampling designs, standard confidence intervals based on normality may have poor coverages, because point estimators may not follow a normal sampling distribution and their variance estimators may be biased.<br/
Second-Order Inference for the Mean of a Variable Missing at Random
We present a second-order estimator of the mean of a variable subject to
missingness, under the missing at random assumption. The estimator improves
upon existing methods by using an approximate second-order expansion of the
parameter functional, in addition to the first-order expansion employed by
standard doubly robust methods. This results in weaker assumptions about the
convergence rates necessary to establish consistency, local efficiency, and
asymptotic linearity. The general estimation strategy is developed under the
targeted minimum loss-based estimation (TMLE) framework. We present a
simulation comparing the sensitivity of the first and second order estimators
to the convergence rate of the initial estimators of the outcome regression and
missingness score. In our simulation, the second-order TMLE improved the
coverage probability of a confidence interval by up to 85%. In addition, we
present a first-order estimator inspired by a second-order expansion of the
parameter functional. This estimator only requires one-dimensional smoothing,
whereas implementation of the second-order TMLE generally requires kernel
smoothing on the covariate space. The first-order estimator proposed is
expected to have improved finite sample performance compared to existing
first-order estimators. In our simulations, the proposed first-order estimator
improved the coverage probability by up to 90%. We provide an illustration of
our methods using a publicly available dataset to determine the effect of an
anticoagulant on health outcomes of patients undergoing percutaneous coronary
intervention. We provide R code implementing the proposed estimator
The optimization problem of quantile and poverty measures estimation based on calibration
New calibrated estimators of quantiles and poverty measures are proposed. These estimators combine the incorporation of auxiliary information provided by auxiliary variables related to the variable of interest by calibration techniques with the selection of optimal calibration points under simple random sampling without replacement. The problem of selecting calibration points that minimize the asymptotic variance of the quantile estimator is addressed. Once the problem is solved, the definition of the new quantile estimator requires that the optimal estimator of the distribution function on which it is based verifies the properties of the distribution function. Through a theorem, the nondecreasing monotony property for the optimal estimator of the distribution function is established and the corresponding optimal estimator can be defined. This optimal quantile estimator is also used to define new estimators for poverty measures. Simulation studies with real data from the Spanish living conditions survey compares the performance of the new estimators against various methods proposed previously, where some resampling techniques are used for the variance estimation. Based on the results of the simulation study, the proposed estimators show a good performance and are a reasonable alternative to other estimators.Ministerio de Educacion y Cienci
Bandwidth selection in kernel empirical risk minimization via the gradient
In this paper, we deal with the data-driven selection of multidimensional and
possibly anisotropic bandwidths in the general framework of kernel empirical
risk minimization. We propose a universal selection rule, which leads to
optimal adaptive results in a large variety of statistical models such as
nonparametric robust regression and statistical learning with errors in
variables. These results are stated in the context of smooth loss functions,
where the gradient of the risk appears as a good criterion to measure the
performance of our estimators. The selection rule consists of a comparison of
gradient empirical risks. It can be viewed as a nontrivial improvement of the
so-called Goldenshluger-Lepski method to nonlinear estimators. Furthermore, one
main advantage of our selection rule is the nondependency on the Hessian matrix
of the risk, usually involved in standard adaptive procedures.Comment: Published at http://dx.doi.org/10.1214/15-AOS1318 in the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
- …