Search CORE

3 research outputs found

QR Prediction for Statistical Data Integration

Author: Beaumont Jean-François
Dessertaine Alain
Goga Camelia
Medous Estelle
Puech Pauline
Ruiz-Gazen Anne
Publication venue: TSE Working Paper
Publication date: 01/06/2022
Field of study

n this paper, we investigate how a big non-probability database can be used to improve estimates from a small probability sample through data integration techniques. In the situation where the study variable is observed in both data sources, Kim and Tam (2021) proposed two design-consistent estimators that can be justified through dual frame survey theory. First, we provide conditions ensuring that these estimators are more eÿcient than the Horvitz-Thompson estimator when the probability sample is selected using either Poisson sampling or simple random sampling without replacement. Then, we study the class of QR predictors, proposed by Särndal and Wright (1984) to handle the case where the non-probability database contains auxiliary variables but no study variable. We provide conditions ensuring that the QR predictor is asymptotically design-unbiased. Assuming the probability sampling design is not informative, the QR predictor is also model-unbiased regardless of the validity of those conditions. We compare the design properties of di˙erent predictors, in the class of QR predictors, through a simulation study. They include a model-based predictor, a model-assisted estimator and a cosmetic estimator. In our simulation setups, the cosmetic estimator performed slightly better than the model-assisted estimator. As expected, the model-based predictor did not perform well when the underlying model was misspecified

Toulouse Capitole Publications

Many-to-One indirect sampling with application to the French postal traffic estimation

Author: Beaumont Jean-François
Dessertaine Alain
Goga Camelia
Medous Estelle
Puech Pauline
Ruiz-Gazen Anne
Publication venue: TSE Working Paper
Publication date: 01/11/2021
Field of study

In social and economic surveys, it can be diﬃcult to directly reach units of the target population, and indirect sampling is often advocated to solve this issue. In indirect sampling, the sample is drawn from a frame population that is linked to the target population, and estimation of tar-get population parameters is typically achieved through the Generalized Weight Share Method (GWSM). This method provides a weight, for every unit of the target population, that depends on the one hand, on the sam-pling weights in the frame population and, on the other hand, on the link weights between the frame population and the target population. In the present study, we focus on the situation in which the units from the frame population are linked to one and only one unit from the target population (Many-to-One case). This situation is encountered at the French postal service where addresses are sampled instead of postman rounds. We aim at understanding of the impact of the link weights on the eﬃciency of the GWSM estimators. We derive variance expressions and optimality results for a large class of sampling designs. Moreover, we note that the Many-to-One case can lead to too many links to observe. We alleviate the problem by introducing an intermediate population and double indirect sampling. The question of the loss of precision in this situation is discussed in detail through theoretical results and simulations. These ﬁndings help to ex-plain the loss of precision of double GWSM estimators observed recently at the French postal service

Toulouse Capitole Publications

Toulouse 1 Capitole Publications

Many-to-One indirect samplingwith application to the French postaltraffic estimation

Author: Beaumont Jean-François
Dessertaine Alain
Goga Camelia
Medous Estelle
Puech Pauline
Ruiz-Gazen Anne
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2023
Field of study

National audienceIn social and economic surveys, it can be difficult to directly reach units of the target population, and indirect sampling is often advocated to solve this issue. In indirect sampling, the sample is drawn from a frame population that is linked to the target population, and estimation of tar-get population parameters is typically achieved through the Generalized Weight Share Method (GWSM). This method provides a weight, for every unit of the target population, that depends on the one hand, on the sampling weights in the frame population and, on the other hand, on the link weights between the frame population and the target population. In the present study, we focus on the situation in which the units from the frame population are linked to one and only one unit from the target population (Many-to-One case). This situation is encountered at the French postal service where addresses are sampled instead of postman rounds. We aim at understanding of the impact of the link weights on the efficiency of the GWSM estimators. We derive variance expressions and optimality results for a large class of sampling designs. Moreover, we note that the Many-to-One case can lead to too many links to observe. We alleviate the problem by introducing an intermediate population and double indirect sampling. The question of the loss of precision in this situation is discussed in detail through theoretical results and simulations. These findings help to explain the loss of precision of double GWSM estimators observed recently at the French postal service

HAL-uB

HAL - Université de Franche-Comté

Toulouse Capitole Publications