54,381 research outputs found

    Estimating regional income indicators under transformations and access to limited population auxiliary information

    Get PDF
    Spatially disaggregated income indicators are typically estimated by using model-based methods that assume access to auxiliary information from population micro-data. In many countries like Germany and the UK population micro-data are not publicly available. In this work we propose small area methodology when only aggregate population-level auxiliary information is available. We use data-driven transformations of the response to satisfy the parametric assumptions of the used models. In the absence of population micro-data, appropriate bias-corrections for small area prediction are needed. Under the approach we propose in this paper, aggregate statistics (means and covariances) and kernel density estimation are used to resolve the issue of not having access to population micro-data. We further explore the estimation of the mean squared error using the parametric bootstrap. Extensive model-based and design-based simulations are used to compare the proposed method to alternative methods. Finally, the proposed methodology is applied to the 2011 Socio-Economic Panel and aggregate census information from the same year to estimate the average income for 96 regional planning regions in Germany

    Small Area Estimation under Limited Auxiliary Population Data Dealing with Model Violations and their Economic Applications

    Get PDF
    For evidence-based policy-making, reliable information on socio-economic indicators are essential. Sample surveys have a long tradition of providing cost-efficient information on these indicators. Mostly, there is a demand for the quantity of interest not only at the level of the total population, but especially at the level of sub-populations (geographic areas or sociodemographic groups) called areas or domains. To gain insights into these sub-populations, disaggregated direct estimators can be used, which are calculated solely on area-specific survey data. An area is regarded as ’large’ if the sample size is large enough to enable reliable direct estimates. If the precision of the direct estimates is not sufficient or the sample size is even zero, the area is considered as ’small’. This is particularly common at high spatial or socio-demographic resolutions. Small area estimation (SAE) is promising to overcome this problem without the need for larger and thus more costly surveys. The essence of SAE techniques is that they ’borrow strength’ from other areas to improve their predictions. For this purpose, a model is built on survey data that links additional auxiliary data and exploits area-specific structures. Suitable auxiliary data sources are administrative and register data, such as the census. In many countries, such data are strictly protected by confidentiality agreements and access to population micro-data is a challenge even for gatekeeper organisations. Thus, users have an increased interest in SAE estimators that do not require population micro-data to serve as auxiliary data. In this thesis, new methods in the absence of population micro-data are presented and applications on socio-economic highly relevant indicators are demonstrated. Since different SAE models impose different data requirements, Part I bundles research combining unit-level survey data and limited auxiliary data, e.g., aggregated data such as means, which is a common data situation for users. To account for the unit-level survey information the use of the well-known nested error regression (NER) model is targeted. This model is a special case of a linear mixed model based on several assumptions. But how can users proceed if the model assumptions are not fulfilled? In Part I, this thesis provides two new approaches to deal with this issue. One promising approach is to transform the response. Since several socio-economically relevant variables, such as income, have a skewed distribution, the log-transformation of the response is an established way to meet the assumptions. However, the data-driven log-shift transformation is even more promising because it extends the log by an additional parameter and achieves more flexibility. Chapter 1 introduces both transformations in the absence of population micro-data. A particular challenge is the transformation of the small area means back to the original scale. Hence, the proposed approach introduces aggregate statistics (means and covariances) and kernel density estimation to resolve the issue of lacking population micro-data. Uncertainty estimation is developed, and all methods are evaluated in design- and model-based settings. The proposed method is applied to estimate regional income in Germany using the Socio-Economic Panel and census data. It achieves a clear improvement in reliability, and thus demonstrates the importance of the method. To conveniently enable further applications, this new methodology is implementedin the R package saeTrafo. Chapter 2 describes the various functionalities of the package using publicly available income data. To increase user-friendliness, established unit-level models under transformations and their uncertainty estimations are implemented and the most suitable method is automatically selected. For some applications, however, it is challenging to find a suitable transformation or, more generally, to specify a model, particularly in the presence of complex interactions. For this case, machine learning methods are valuable as a transformation is not necessarily required nor a model needs to be explicitly specified. The semi-parametric framework of mixed effects random forest (MERF) combines the advantages of random forests (robustness against outliers and implicit model-selection) with the ability to model hierarchical dependencies as present in SAE approaches. Chapter 3 introduces MERFs in the absence of population micro-data. As existing random forest algorithm require unit-level auxiliary population data, an alternative strategy is introduced. It adaptively incorporates aggregated auxiliary information through calibration-weights to circumvent unit-level auxiliary data. Applying the proposed method on opportunity costs of care work for Germany using the Socio-Economic Panel and census data demonstrates the gain in accuracy in comparison to both direct estimates and the classical NER model. In contrast to methods using a unit-level sample survey, Part II focuses on the well-known class of area-level SAE models requiring direct estimates from a survey while using (once again) only aggregated population auxiliary data. This thesis presents two particularly relevant applications of this model class. Chapter 4 examines regional consumer price indices (CPIs) in the United Kingdom (UK), contributing to the great interest in monitoring inflation at the spatial level. The SAE challenge is to construct model-based expenditure weights to generate the regional basket of goods and services for the twelve regions of the UK. They are estimated and constructed from the living cost and food survey. Furthermore, available price data are linked to the SAE estimated baskets to produce regional CPIs. The resulting CPI series are closely examined, and smoothing techniques are applied. As a result, the reliability improves, but the CPI series are still too volatile for policy use. However, our research serves as a valuable framework for the creation of a regional CPI in the future. The second application also explores the reliability of the disaggregated estimation of a politically and economically highly relevant indicator, in this case the unemployment rate. The regional target level are the functional urban areas in the German federal state North Rhine-Westphalia. In Chapter 5, two types of unemployment rates - the traditional one and an alternative definition taking commuting into account - are estimated and compared. Direct estimates from the labour force survey are linked with SAE methods to passively collected mobile network data. This alternative data source is real-time available, offers spatial flexible resolutions, and is dynamic. In compliance with data protection rules, we obtain aggregated auxiliary mobile network information from the data provider. The SAE methods improve the reliability, and the resulting predictions show that alternative unemployment rates in German city cores are lower than traditional estimated official unemployment rates indicate

    Small area estimation for spatially correlated populations - a comparison of direct and indirect model-based methods

    Get PDF
    Linear mixed models underpin many small area estimation (SAE) methods. In this paper we investigate SAE based on linear models with spatially correlated small area effects where the neighbourhood structure is described by a contiguity matrix. Such models allow efficient use of spatial auxiliary information in SAE. In particular, we use simulation studies to compare the performances of model-based direct estimation (MBDE) and empirical best linear unbiased prediction (EBLUP) under such models. These simulations are based on theoretically generated populations as well as data obtained from two real populations (the ISTAT farm structure survey in Tuscany and the US Environmental Monitoring and Assessment Program survey). Our empirical results show only marginal gains when spatial dependence between areas is incorporated into the SAE model

    Small area estimation of general parameters with application to poverty indicators: A hierarchical Bayes approach

    Full text link
    Poverty maps are used to aid important political decisions such as allocation of development funds by governments and international organizations. Those decisions should be based on the most accurate poverty figures. However, often reliable poverty figures are not available at fine geographical levels or for particular risk population subgroups due to the sample size limitation of current national surveys. These surveys cannot cover adequately all the desired areas or population subgroups and, therefore, models relating the different areas are needed to 'borrow strength" from area to area. In particular, the Spanish Survey on Income and Living Conditions (SILC) produces national poverty estimates but cannot provide poverty estimates by Spanish provinces due to the poor precision of direct estimates, which use only the province specific data. It also raises the ethical question of whether poverty is more severe for women than for men in a given province. We develop a hierarchical Bayes (HB) approach for poverty mapping in Spanish provinces by gender that overcomes the small province sample size problem of the SILC. The proposed approach has a wide scope of application because it can be used to estimate general nonlinear parameters. We use a Bayesian version of the nested error regression model in which Markov chain Monte Carlo procedures and the convergence monitoring therein are avoided. A simulation study reveals good frequentist properties of the HB approach. The resulting poverty maps indicate that poverty, both in frequency and intensity, is localized mostly in the southern and western provinces and it is more acute for women than for men in most of the provinces.Comment: Published in at http://dx.doi.org/10.1214/13-AOAS702 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Estimating regional unemployment with mobile network data for Functional Urban Areas in Germany

    Get PDF
    The ongoing growth of cities due to better job opportunities is leading to increased labour-relatedcommuter flows in several countries. On the one hand, an increasing number of people commuteand move to the cities, but on the other hand, the labour market indicates higher unemployment ratesin urban areas than in the surrounding areas. We investigate this phenomenon on regional level byan alternative definition of unemployment rates in which commuting behaviour is integrated. Wecombine data from the Labour Force Survey (LFS) with dynamic mobile network data by small areamodels for the federal state North Rhine-Westphalia in Germany. From a methodical perspective, weuse a transformed Fay-Herriot model with bias correction for the estimation of unemployment ratesand propose a parametric bootstrap for the Mean Squared Error (MSE) estimation that includes thebias correction. The performance of the proposed methodology is evaluated in a case study based onofficial data and in model-based simulations. The results in the application show that unemploymentrates (adjusted by commuters) in German cities are lower than traditional official unemployment ratesindicate

    Benefits of past inventory data as prior information for the current inventory

    Get PDF
    When auxiliary information in the form of airborne laser scanning (ALS) is used to assist in estimating the population parameters of interest, the benefits of prior information from previous inventories are not self-evident. In a simulation study, we compared three different approaches: 1) using only current data, 2) using non-updated old data and current data in a composite estimator and 3) using updated old data and current data with a Kalman filter. We also tested three different estimators, namely i) Horwitz-Thompson for a case of no auxiliary information, ii) model-assisted estimation and iii) model-based estimation. We compared these methods in terms of bias, precision and accuracy, as estimators utilizing prior information are not guaranteed to be unbiased.202

    Methodological Issues in Spatial Microsimulation Modelling for Small Area Estimation

    Get PDF
    In this paper, some vital methodological issues of spatial microsimulation modelling for small area estimation have been addressed, with a particular emphasis given to the reweighting techniques. Most of the review articles in small area estimation have highlighted methodologies based on various statistical models and theories. However, spatial microsimulation modelling is emerging as a very useful alternative means of small area estimation. Our findings demonstrate that spatial microsimulation models are robust and have advantages over other type of models used for small area estimation. The technique uses different methodologies typically based on geographic models and various economic theories. In contrast to statistical model-based approaches, the spatial microsimulation model-based approaches can operate through reweighting techniques such as GREGWT and combinatorial optimization. A comparison between reweighting techniques reveals that they are using quite different iterative algorithms and that their properties also vary. The study also points out a new method for spatial microsimulation modellingBayesian prediction approach; combinatorial optimisation; GREGWT; microdata; small area estimation; spatial microsimulation
    • …
    corecore