20 research outputs found
Estimating regional unemployment with mobile network data for Functional Urban Areas in Germany
The ongoing growth of cities due to better job opportunities is leading to increased labour-relatedcommuter flows in several countries. On the one hand, an increasing number of people commuteand move to the cities, but on the other hand, the labour market indicates higher unemployment ratesin urban areas than in the surrounding areas. We investigate this phenomenon on regional level byan alternative definition of unemployment rates in which commuting behaviour is integrated. Wecombine data from the Labour Force Survey (LFS) with dynamic mobile network data by small areamodels for the federal state North Rhine-Westphalia in Germany. From a methodical perspective, weuse a transformed Fay-Herriot model with bias correction for the estimation of unemployment ratesand propose a parametric bootstrap for the Mean Squared Error (MSE) estimation that includes thebias correction. The performance of the proposed methodology is evaluated in a case study based onofficial data and in model-based simulations. The results in the application show that unemploymentrates (adjusted by commuters) in German cities are lower than traditional official unemployment ratesindicate
Estimation of Disaggregated Indicators with Application to the Household Finance and Consumption Survey
International institutions and national statistical institutes are increasingly expected to report disaggregated indicators, i.e., means, ratios or Gini coefficients for different regional levels, socio-demographic groups or other subpopulations. These subpopulations are called areas or domains in this thesis. The data sources that are used to estimate these disaggregated indicators are mostly national surveys which may have small sample sizes for the domains of interest. Therefore, direct estimates that are based only on the survey data might be unreliable. To overcome this problem, small area estimation (SAE) methods help to increase the precision of survey-based estimates without demanding larger and more costly surveys. In SAE, the collected survey data is combined with other data sources, e.g., administrative and register data or data that is a by-product of digital activities.
The data requirements for various SAE methods depend to a large extent on whether the indicator of interest is a linear or non-linear function of a quantitative variable. For the estimation of linear indicators, e.g., the mean, aggregated data is sufficient, that is, direct estimates and auxiliary information from other data sources only need to be available for each domain. One popular area-level approach in this context is the Fay-Herriot model that is studied in Part 1 of this work. In Chapter 1, the Fay-Herriot model is used to estimate the regional distribution of the mean household net wealth in Germany. The analysis is based on the Household Finance and Consumption Survey (HFCS) that was launched by the European Central bank and several statistical institutes in 2010. The main challenge of applying the Fay-Herriot approach in this context is to handle the issues arising from the data: a) the skewness of the wealth distribution, b) informative weights due to, among others, unit non-response, and c) multiple imputation to deal with item non-response. For the latter, a modified Fay-Herriot model that accounts for the additional uncertainty due to multiple imputation is proposed in this thesis. It is combined with known solutions for the other two issues and applied to estimate mean net wealth at low regional levels.
The Deutsche Bundesbank that is responsible for reporting the wealth distribution in Germany, as well as many economic institutes, predominantly work with the statistical software Stata. In order to provide the Fay-Herriot model and its extensions used in Chapter 1, a new Stata command called fayherriot is programmed in the context of this thesis to make the approach available for practitioners. Chapter 2 describes the functionality of the command with an application to income data from the Socio-Economic Panel, one of the largest panel surveys in Germany. The example application demonstrates how the Fay-Herriot approach helps to increase the reliability of estimates for mean household income compared to direct estimates at three different regional levels.
In an extension to estimating linear indicators, Part 2 deals with the estimation of non-linear income and wealth indicators. Since the mean is sensitive to outliers, the median and other quantiles are also of interest when estimating the income or wealth distribution. As a first approach, this thesis focuses on the direct estimation of quantiles, which is not as straightforward as for the mean. In Chapter 3, common quantile definitions implemented in standard statistical software are empirically evaluated based on income and wealth distributions with regards to their bias. The analysis shows that, especially for wealth data that is mostly heavily skewed, sample sizes need to be large in order to obtain unbiased direct estimates with the common quantile definitions.
Since a design-unbiased direct estimator is one assumption of the aforementioned Fay-Herriot model, further research would be necessary in order to use the Fay-Herriot approach for the estimation of quantiles when the underlying data is heavily skewed. More common methods for producing reliable estimates for non-linear indicators -- including quantiles, poverty indicators, and inequality indicators such as the Gini coefficient -- in small domains are unit-level SAE methods. However, for these methods, the data requirements are more restrictive. Both the survey data and the auxiliary data need to be available for each unit in each domain. Among others, the empirical best prediction (EBP), the World-Bank method, and the M-Quantile approach are well-known methods for the estimation of non-linear indicators in small domains. However, these methods are either not available in statistical software or the user-friendliness is limited. Therefore, in this work the R package emdi is developed that focuses on an user-friendly application of the EBP. Chapter 4 describes how the package emdi supports the user beyond the estimation by tools for assessing and presenting the results.
Both, area- and unit-level SAE models, are based on linear mixed regression models that rely on a set of assumptions, particularly the linearity and normality of the error terms. If these assumptions are not fulfilled, transforming the response variable is one possible solution. Therefore, Part 3 provides a guideline for the usage of transformations. Chapter 5 gives an extensive overview of different transformations applicable in linear and linear mixed regression models and discusses practical challenges. The implementation of various transformations and estimation methods for transformation parameters are provided by the R package trafo that is described in Chapter 6.
Altogether, this work contributes to the literature by
a) combining SAE and multiple imputation proposing a modified Fay-Herriot approach,
b) showing limitations of existing quantile definitions with regards to the bias when data is skewed and the sample size is small,
c) closing the gap between academic research and practical applications by providing user-friendly software for the estimation of linear and non-linear indicators, and
d) giving a framework for the usage of transformations in linear and linear mixed regression models
Estimating regional unemployment with mobile network data for Functional Urban Areas in Germany
The ongoing growth of cities due to better job opportunities is leading to increased labour-related commuter flows in several countries. On the one hand, an increasing number of people commute and move to the cities, but on the other hand, the labour market indicates higher unemployment rates in urban areas than in the surrounding areas. We investigate this phenomenon on regional level by an alternative definition of unemployment rates in which commuting behaviour is integrated. We combine data from the Labour Force Survey with dynamic mobile network data by small area models for the federal state North Rhine-Westphalia in Germany. From a methodical perspective, we use a transformed Fay–Herriot model with bias correction for the estimation of unemployment rates and propose a parametric bootstrap for the mean squared error estimation that includes the bias correction. The performance of the proposed methodology is evaluated in a case study based on official data and in model-based simulations. The results in the application show that unemployment rates (adjusted by commuters) in German cities are lower than traditional official unemployment rates indicate
A Framework for Producing Small Area Estimates Based on Area-Level Models in R
The R package emdi facilitates the estimation of regionally disaggregated indicators using small area estimation methods and provides tools for model building, diagnostics, presenting, and exporting the results. The package version 1.1.7 includes unit-level small area models that rely on access to micro data. The area-level model by Fay and Herriot (1979) and various extensions have been added to the package since the release of version 2.0.0. These extensions include (a) area-level models with back-transformations, (b) spatial and robust extensions, (c) adjusted variance estimation methods, and (d) area-level models that account for measurement errors. Corresponding mean squared error estimators are implemented for assessing the uncertainty. User-friendly tools like a stepwise variable selection, model diagnostics, benchmarking options, high quality maps and results exportation options enable a complete analysis procedure. The functionality of the package is illustrated by examples based on synthetic data for Austrian districts
Small area estimation in R with application to Mexican income data
In the last decades policy decisions are often based on statistical measures. The more detailed this information is, the better is the basis for targeting policies and evaluating policy programs. For instance, the United Nations suggest more disaggregation of statistical indicators for monitoring their Sustainable Development Goals and also the number of National Statistical Institutes (NSIs) that notice the need of more disaggregated statistics is increasing. Dimensions for disaggregation can be characteristics of the individuals or households like sex, age or ethnicity, economic activity or spatial dimensions like metropolitan areas or districts. Primary data sources for variables that are used to estimate statistical indicators are national household surveys. However, sample sizes are usually small or even zero at disaggregated levels. Therefore, direct estimators based only on survey data can be unreliable or not available for small domains. While the option of more specific surveys is costly, model-based methodologies for dealing with small sample sizes can help to obtain reliable estimates for small domains. The so-called Small Area Estimation (SAE) methods [1,2] link survey data that is only available for a proportion of households with administrative or census data available for all households in the area of interest. Even though a wide range of SAE methods is proposed by academic researchers, these are, so far, applied only by a small number of NSIs or other practitioners like the World Bank. This gap between theoretical possibilities and practical application can have several reasons. One reason can be the lack of suitable statistical software. The free software environment R helps to counteract this issue since researchers can make their codes available to the public via packages. Thus, new methods can reach the practitioner faster than with non-free software. The next two sections summarize which packages are already available and what could be improved in the future
The R Package emdi for Estimating and Mapping Regionally Disaggregated Indicators
The R package emdi enables the estimation of regionally disaggregated indicators using small area estimation methods and includes tools for processing, assessing, and presenting the results. The mean of the target variable, the quantiles of its distribution, the headcount ratio, the poverty gap, the Gini coefficient, the quintile share ratio, and customized indicators are estimated using direct and model-based estimation with the empirical best predictor (Molina and Rao 2010). The user is assisted by automatic estimation of datadriven transformation parameters. Parametric and semi-parametric, wild bootstrap for mean squared error estimation are implemented with the latter offering protection against possible misspecification of the error distribution. Tools for (a) customized parallel computing, (b) model diagnostic analyses, (c) creating high quality maps and (d) exporting the results to Excel and OpenDocument Spreadsheets are included. The functionality of the package is illustrated with example data sets for estimating the Gini coefficient and median income for districts in Austria
The R Package emdi for Estimating and Mapping Regionally Disaggregated Indicators
The R package emdi offers a methodological and computational framework for the
estimation of regionally disaggregated indicators using small area estimation
methods and provides tools for assessing, processing and presenting the
results. A range of indicators that includes the mean of the target variable,
the quantiles of its distribution and complex, non-linear indicators or
customized indicators can be estimated simultaneously using direct estimation
and the empirical best predictor (EBP) approach (Molina and Rao 2010). In the
application presented in this paper package emdi is used for estimating
inequality indicators and the median of the income distributions for small
areas in Austria. Because the EBP approach relies on the normality of the
mixed model error terms, the user is further assisted by an automatic
selection of data-driven transformation parameters. Estimating the uncertainty
of small area estimates (using a mean squared error - MSE measure) is achieved
by using both parametric bootstrap and semi-parametric wild bootstrap. The
additional uncertainty due to the estimation of the transformation parameter
is also captured in MSE estimation. The semi-parametric wild bootstrap further
protects the user against departures from the assumptions of the mixed model
in particular, those of the unit-level error term. The bootstrap schemes are
facilitated by computationally effcient code that uses parallel computing. The
package supports the users beyond the production of small area estimates.
Firstly, tools are provided for exploring the structure of the data and for
diagnostic analysis of the model assumptions. Secondly, tools that allow the
spatial mapping of the estimates enable the user to create high quality
visualizations. Thirdly, results and model summaries can be exported to Excel™
spreadsheets for further reporting purposes
Semaglutide and cardiovascular outcomes in patients with obesity and prevalent heart failure: a prespecified analysis of the SELECT trial
Background: Semaglutide, a GLP-1 receptor agonist, reduces the risk of major adverse cardiovascular events (MACE) in people with overweight or obesity, but the effects of this drug on outcomes in patients with atherosclerotic cardiovascular disease and heart failure are unknown. We report a prespecified analysis of the effect of once-weekly subcutaneous semaglutide 2·4 mg on ischaemic and heart failure cardiovascular outcomes. We aimed to investigate if semaglutide was beneficial in patients with atherosclerotic cardiovascular disease with a history of heart failure compared with placebo; if there was a difference in outcome in patients designated as having heart failure with preserved ejection fraction compared with heart failure with reduced ejection fraction; and if the efficacy and safety of semaglutide in patients with heart failure was related to baseline characteristics or subtype of heart failure. Methods: The SELECT trial was a randomised, double-blind, multicentre, placebo-controlled, event-driven phase 3 trial in 41 countries. Adults aged 45 years and older, with a BMI of 27 kg/m2 or greater and established cardiovascular disease were eligible for the study. Patients were randomly assigned (1:1) with a block size of four using an interactive web response system in a double-blind manner to escalating doses of once-weekly subcutaneous semaglutide over 16 weeks to a target dose of 2·4 mg, or placebo. In a prespecified analysis, we examined the effect of semaglutide compared with placebo in patients with and without a history of heart failure at enrolment, subclassified as heart failure with preserved ejection fraction, heart failure with reduced ejection fraction, or unclassified heart failure. Endpoints comprised MACE (a composite of non-fatal myocardial infarction, non-fatal stroke, and cardiovascular death); a composite heart failure outcome (cardiovascular death or hospitalisation or urgent hospital visit for heart failure); cardiovascular death; and all-cause death. The study is registered with ClinicalTrials.gov, NCT03574597. Findings: Between Oct 31, 2018, and March 31, 2021, 17 604 patients with a mean age of 61·6 years (SD 8·9) and a mean BMI of 33·4 kg/m2 (5·0) were randomly assigned to receive semaglutide (8803 [50·0%] patients) or placebo (8801 [50·0%] patients). 4286 (24·3%) of 17 604 patients had a history of investigator-defined heart failure at enrolment: 2273 (53·0%) of 4286 patients had heart failure with preserved ejection fraction, 1347 (31·4%) had heart failure with reduced ejection fraction, and 666 (15·5%) had unclassified heart failure. Baseline characteristics were similar between patients with and without heart failure. Patients with heart failure had a higher incidence of clinical events. Semaglutide improved all outcome measures in patients with heart failure at random assignment compared with those without heart failure (hazard ratio [HR] 0·72, 95% CI 0·60-0·87 for MACE; 0·79, 0·64-0·98 for the heart failure composite endpoint; 0·76, 0·59-0·97 for cardiovascular death; and 0·81, 0·66-1·00 for all-cause death; all pinteraction>0·19). Treatment with semaglutide resulted in improved outcomes in both the heart failure with reduced ejection fraction (HR 0·65, 95% CI 0·49-0·87 for MACE; 0·79, 0·58-1·08 for the composite heart failure endpoint) and heart failure with preserved ejection fraction groups (0·69, 0·51-0·91 for MACE; 0·75, 0·52-1·07 for the composite heart failure endpoint), although patients with heart failure with reduced ejection fraction had higher absolute event rates than those with heart failure with preserved ejection fraction. For MACE and the heart failure composite, there were no significant differences in benefits across baseline age, sex, BMI, New York Heart Association status, and diuretic use. Serious adverse events were less frequent with semaglutide versus placebo, regardless of heart failure subtype. Interpretation: In patients with atherosclerotic cardiovascular diease and overweight or obesity, treatment with semaglutide 2·4 mg reduced MACE and composite heart failure endpoints compared with placebo in those with and without clinical heart failure, regardless of heart failure subtype. Our findings could facilitate prescribing and result in improved clinical outcomes for this patient group. Funding: Novo Nordisk
The fayherriot command for estimating small-area indicators
We introduce a command, fayherriot, that implements the Fay– Herriot model (Fay and Herriot, 1979, Journal of the American Statistical Association 74: 269–277), which is a small-area estimation technique (Rao and Molina, 2015, Small Area Estimation), in Stata. The Fay–Herriot model improves the precision of area-level direct estimates using area-level covariates. It belongs to the class of linear mixed models with normally distributed error terms. The fayherriot command encompasses options to a) produce out-of-sample predictions, b) adjust nonpositive random-effects variance estimates, and c) deal with the violation of model assumptions