99 research outputs found
Randomized response estimation in multiple frame surveys
Large scale surveys are increasingly delving into sensitive topics such as gambling,
alcoholism, drug use, sexual behavior, domestic violence. Sensitive, stigmatizing
or even incriminating themes are difficult to investigate by using standard datacollection techniques since respondents are generally reluctant to release information which concern their personal sphere. Further, such topics usually pertain elusive
population (e.g., irregular immigrants and homeless, alcoholics, drug users, rape and
sexual assault victims) which are difficult to sample since not adequately covered
in a single sampling frame. On the other hand, researchers often utilize more than
one data-collection mode (i.e., mixed-mode surveys) in order to increase response
rates and/or improve coverage of the population of interest. Surveying sensitive and
elusive populations and mixed-mode researches are strictly connected with multiple
frame surveys which are becoming widely used to decrease bias due to undercoverage
of the target population. In this work, we combine sensitive research and multiple
frame surveys. In particular, we consider statistical techniques for handling sensitive
data coming from multiple frame surveys using complex sampling designs. Our aim
is to estimate the mean of a sensitive variable connected to undesirable behaviors
when data are collected by using the randomized response theory. Some estimators
are constructed and their properties theoretically investigated. Variance estimation
is also discussed by means of the jackknife technique. Finally, a Monte Carlo simulation study is conducted to evaluate the performance of the proposed estimators
and the accuracy of variance estimation..Ministerio de EconomĂa y CompetitividadFPU grant programConsejerĂa de Empleo, Empresa y Comercio, Junta de AndalucĂ
Comments on: Deville and Särndal’s calibration: revisiting a 25 years old successful optimization problem
Ministerio de EconomĂa y Competitivida
Propensity score adjustment using machine learning classification algorithms to control selection bias in online surveys
Modern survey methods may be subject to non-observable bias, from various sources.
Among online surveys, for example, selection bias is prevalent, due to the sampling mechanism commonly used, whereby participants self-select from a subgroup whose characteristics differ from those of the target population. Several techniques have been proposed to
tackle this issue. One such is Propensity Score Adjustment (PSA), which is widely used and
has been analysed in various studies. The usual method of estimating the propensity score
is logistic regression, which requires a reference probability sample in addition to the online
nonprobability sample. The predicted propensities can be used for reweighting using various estimators. However, in the online survey context, there are alternatives that might outperform logistic regression regarding propensity estimation. The aim of the present study is
to determine the efficiency of some of these alternatives, involving Machine Learning (ML)
classification algorithms. PSA is applied in two simulation scenarios, representing situations
commonly found in online surveys, using logistic regression and ML models for propensity
estimation. The results obtained show that ML algorithms remove selection bias more effectively than logistic regression when used for PSA, but that their efficacy depends largely on
the selection mechanism employed and the dimensionality of the data.This study was partially supported by
Ministerio de EconomĂa y Competitividad, Spain
[grant number MTM2015-63609-R] and, in terms
of the first author, a FPU grant from the Ministerio
de Ciencia, Innovacio´n y Universidades, Spain. The
funders had no role in study design, data collection
and analysis, decision to publish, or preparation of
the manuscript
Efficiency of propensity score adjustment and calibration on the estimation from non-probabilistic online surveys
One of the main sources of inaccuracy in modern survey techniques, such as online and smartphone surveys, is the absence of an adequate sampling frame that could provide a probabilistic sampling. This kind of data collection leads to the presence of high amounts of bias in final estimates of the survey, specially if the estimated variables (also known as target variables) have some influence on the decision of the respondent to participate in the survey. Various correction techniques, such as calibration and propensity score adjustment or PSA, can be applied to remove the bias. This study attempts to analyse the efficiency of correction techniques in multiple situations, applying a combination of propensity score adjustment and calibration on both types of variables (correlated and not correlated with the missing data mechanism) and testing the use of a reference survey to get the population totals for calibration variables. The study was performed using a simulation of a fictitious population of potential voters and a real volunteer survey aimed to a population for which a complete census was available. Results showed that PSA combined with calibration results in a bias removal considerably larger when compared with calibration with no prior adjustment. Results also showed that using population totals from the estimates of a reference survey instead of the available population data does not make a difference in estimates accuracy, although it can contribute to slightly increment the variance of the estimator
Variable selection in Propensity Score Adjustment to mitigate selection bias in online surveys
The development of new survey data collection methods such as online surveys has
been particularly advantageous for social studies in terms of reduced costs, immediacy
and enhanced questionnaire possibilities. However, many such methods are strongly
affected by selection bias, leading to unreliable estimates. Calibration and Propensity
Score Adjustment (PSA) have been proposed as methods to remove selection bias in
online nonprobability surveys. Calibration requires population totals to be known for
the auxiliary variables used in the procedure, while PSA estimates the volunteering
propensity of an individual using predictive modelling. The variables included in
these models must be carefully selected in order to maximise the accuracy of the final
estimates. This study presents an application, using synthetic and real data, of variable
selection techniques developed for knowledge discovery in data to choose the best
subset of variables for propensity estimation.We also compare the performance of PSA
using different classification algorithms, after which calibration is applied. We also
present an application of this methodology in a real-world situation, using it to obtain
estimates of population parameters. The results obtained show that variable selection
using appropriate methods can provide less biased and more efficient estimates than
using all available covariatesMinisterio de Ciencia e InnovaciĂłn, Spain [Grant No. PID2019-106861RBI00/AEI/10.13039/501100011033].
FPU grant from Ministerio de Ciencia, InnovaciĂłn y Universidades.
Funding for open access charge: Universidad de Granada / CBUA Spain.
IMAG-Maria de Maeztu CEX2020-001105-M/AEI/10.13039/50110001103
Treating nonresponse in the estimation of the distribution function
The estimation of a finite population distribution function is considered when there are missing data. Calibration adjustment is used for dealing with nonresponse at the estimation stage. Several procedures are proposed and compared. A numerical study is carried out to evaluate the performances of estimators. Computational problems with the implementation of the proposed calibration estimators are also considered.Ministerio de EconomĂa y Competitividad of Spai
Reduction of optimal calibration dimension with a new optimal auxiliary vector for calibrated estimators of the distribution function
The calibration method has been widely used to incorporate auxiliary information
in the estimation of various parameters. Specifically, adapted this method to estimate
the distribution function, although their proposal is computationally simple,
its efficiency depends on the selection of an auxiliary vector of points. This work
deals with the problem of selecting the calibration auxiliary vector that minimize the
asymptotic variance of the calibration estimator of distribution function. The optimal
dimension of the optimal auxiliary vector is reduced considerably with respect
to previous studies so that with a smaller set of points the minimum of the asymptotic
variance can be reached, which in turn allows to improve the efficiency of the
estimates
The optimization problem of quantile and poverty measures estimation based on calibration
New calibrated estimators of quantiles and poverty measures are proposed. These estimators combine the incorporation of auxiliary information provided by auxiliary variables related to the variable of interest by calibration techniques with the selection of optimal calibration points under simple random sampling without replacement. The problem of selecting calibration points that minimize the asymptotic variance of the quantile estimator is addressed. Once the problem is solved, the definition of the new quantile estimator requires that the optimal estimator of the distribution function on which it is based verifies the properties of the distribution function. Through a theorem, the nondecreasing monotony property for the optimal estimator of the distribution function is established and the corresponding optimal estimator can be defined. This optimal quantile estimator is also used to define new estimators for poverty measures. Simulation studies with real data from the Spanish living conditions survey compares the performance of the new estimators against various methods proposed previously, where some resampling techniques are used for the variance estimation. Based on the results of the simulation study, the proposed estimators show a good performance and are a reasonable alternative to other estimators.Ministerio de Educacion y Cienci
Methods to Counter Self-Selection Bias in Estimations of the Distribution Function and Quantiles
Many surveys are performed using non-probability methods such as web surveys, social
networks surveys, or opt-in panels. The estimates made from these data sources are usually biased
and must be adjusted to make them representative of the target population. Techniques to mitigate
this selection bias in non-probability samples often involve calibration, propensity score adjustment,
or statistical matching. In this article, we consider the problem of estimating the finite population
distribution function in the context of non-probability surveys and show how some methodologies
formulated for linear parameters can be adapted to this functional parameter, both theoretically and
empirically, thus enhancing the accuracy and efficiency of the estimates made.Spanish Government PID2019-106861RB-I00IMAG-Maria de Maeztu CEX2020-001105-M/AEI/10.13039/501100011033FEDER/Junta de Andalucia-Consejeria de Transformacion Economica, Industria, Conocimiento y Universidades FQM170-UGR2
- …