951 research outputs found

    Stratification Trees for Adaptive Randomization in Randomized Controlled Trials

    Full text link
    This paper proposes an adaptive randomization procedure for two-stage randomized controlled trials. The method uses data from a first-wave experiment in order to determine how to stratify in a second wave of the experiment, where the objective is to minimize the variance of an estimator for the average treatment effect (ATE). We consider selection from a class of stratified randomization procedures which we call stratification trees: these are procedures whose strata can be represented as decision trees, with differing treatment assignment probabilities across strata. By using the first wave to estimate a stratification tree, we simultaneously select which covariates to use for stratification, how to stratify over these covariates, as well as the assignment probabilities within these strata. Our main result shows that using this randomization procedure with an appropriate estimator results in an asymptotic variance which is minimal in the class of stratification trees. Moreover, the results we present are able to accommodate a large class of assignment mechanisms within strata, including stratified block randomization. In a simulation study, we find that our method, paired with an appropriate cross-validation procedure ,can improve on ad-hoc choices of stratification. We conclude by applying our method to the study in Karlan and Wood (2017), where we estimate stratification trees using the first wave of their experiment

    Transient seasonal and chronic poverty of peasants: Evidence from Rwanda

    Get PDF
    Using panel data from Rwanda, we estimate seasonal transient and chronic poverty indices, for different poverty lines, poverty indicators, equivalence scales, and with and without corrections for price variability and for the sampling scheme. We also estimate sampling standard errors for the poverty indices. The worst poverty crises occur after the dry season at the end of the year. Most of the severity of poverty comes from the seasonal transient component of annual poverty, while the seasonal component of the incidence of poverty is much smaller. Thus the actual differences in the severity of poverty, either between developing and industrial countries or between rural and urban areas in LDCs, may be much worse than is shown by the usual chronic annual poverty measures or by measures of seasonal incidence of poverty. The importance of the transient component suggests a need for an income stabilisation policy. However, the contribution of the global transient seasonal poverty is important for households clustered around the poverty line, but low for the poorest part of the chronically poor. Thus, policies fighting seasonal transient poverty are likely to concern the moderately poor rather than the very poor, as compared with policies against chronic poverty, which affect the very poor. The probability transition analysis across seasonal living standard distributions shows that mobility across quintiles is always very strong. The poverty crisis in the last season is more the result of many peasants falling into poverty than a decrease in the flow out of poverty. A ‘safety net’ policy aimed at the poor and the non-poor at this period would then be appropriate. We estimate equations of quantiles for household chronic and transient seasonal poverty. The agricultural choices of peasants are found to affect differently the two components of annual poverty that could therefore be addressed by a combination of policies specific to each component.

    Why We Should Put Some Weight on Weights

    Get PDF
    Weighting is one of the major components in survey sampling. For a given sample survey, to each unit of the selected sample is attached a weight that is used to obtain estimates of population parameters of interest (e.g., means or totals). The weighting process usually involves three steps: (i) obtain the design weights, which account for sample selection; (ii) adjust these weights to compensate for nonresponse; (iii) adjust the weights so that the estimates coincide to some known totals of the population, which is called calibration. Unfortunately, weighting is often considered as a process restricted to survey sampling and for the production of statistics related to finite populations. This should not be the case because, when using survey data, statistical analyses, modeling and index estimation should use weights in their calculation. This paper tries to describe why weights are useful when dealing with survey data. First, some context is given about weighting in sample surveys. Second, we present the use of weights in statistical analysis, and we give the impact of not using the weights through an illustrative example. Third, the above three weighting steps are formally described

    Attitudes towards old age and age of retirement across the world: findings from the future of retirement survey

    Get PDF
    The 21st century has been described as the first era in human history when the world will no longer be young and there will be drastic changes in many aspects of our lives including socio-demographics, financial and attitudes towards the old age and retirement. This talk will introduce briefly about the Global Ageing Survey (GLAS) 2004 and 2005 which is also popularly known as “The Future of Retirement”. These surveys provide us a unique data source collected in 21 countries and territories that allow researchers for better understanding the individual as well as societal changes as we age with regard to savings, retirement and healthcare. In 2004, approximately 10,000 people aged 18+ were surveyed in nine counties and one territory (Brazil, Canada, China, France, Hong Kong, India, Japan, Mexico, UK and USA). In 2005, the number was increased to twenty-one by adding Egypt, Germany, Indonesia, Malaysia, Poland, Russia, Saudi Arabia, Singapore, Sweden, Turkey and South Korea). Moreover, an additional 6320 private sector employers was surveyed in 2005, some 300 in each country with a view to elucidating the attitudes of employers to issues relating to older workers. The paper aims to examine the attitudes towards the old age and retirement across the world and will indicate some policy implications

    Strategies for Multiply Imputed Survey Data and Modeling in the Context of Small Area Estimation

    Get PDF
    To target resources and policies where they are most needed, it is essential that policy-makers are provided with reliable socio-demographic indicators on sub-groups. These sub-groups can be defined by regional divisions or by demographic characteristics and are referred to as areas or domains. Information on these domains is usually obtained through surveys, often planned at a higher level, such as the national level. As sample sizes at disaggregated levels may become small or unavailable, estimates based on survey data alone may no longer be considered reliable or may not be available. Increasing the sample size is time consuming and costly. Small area estimation (SAE) methods aim to solve this problem and achieve higher precision. SAE methods enrich information from survey data with data from additional sources and "borrow" strength from other domains . This is done by modeling and linking the survey data with administrative or register data and by using area-specific structures. Auxiliary data are traditionally population data available at the micro or aggregate level that can be used to estimate unit-level models or area-level models. Due to strict privacy regulations, it is often difficult to obtain these data at the micro level. Therefore, models based on aggregated auxiliary information, such as the Fay-Herriot model and its extensions, are of great interest for obtaining SAE estimators. Despite the problem of small sample sizes at the disaggregated level, surveys often suffer from high non-response. One possible solution to item non-response is multiple imputation (MI), which replaces missing values with multiple plausible values. The missing values and their replacement introduce additional uncertainty into the estimate. Part I focuses on the Fay-Herriot model, where the resulting estimator is a combination of a design-unbiased estimator based only on the survey data (hereafter called the direct estimator) and a synthetic regression component. Solutions are presented to account for the uncertainty introduced by missing values in the SAE estimator using Rubin's rules. Since financial assets and wealth are sensitive topics, surveys on this type of data suffer particularly from item non-response. Chapter 1 focuses on estimating private wealth at the regionally disaggregated level in Germany. Data from the 2010 Household Finance and Consumption Survey (HFCS) are used for this application. In addition to the non-response problem, income and wealth data are often right-skewed, requiring a transformation to fully satisfy the normality assumptions of the model. Therefore, Chapter 1 presents a modified Fay-Herriot approach that incorporates the uncertainty of missing values into the log-transformed direct estimator of a mean. Chapter 2 complements Chapter 1 by presenting a framework that extends the general class of transformed Fay-Herriot models to account for the additional uncertainty due to MI by including it in the direct component and simultaneously in the regression component of the Fay-Herriot estimator. In addition, the uncertainty due to missing values is also included in the mean squared error estimator, which serves as the uncertainty measure. The estimation of a mean, the use of the log transformation for skewed data, and the arcsine transformation for proportions as target indicators are considered. The proposed framework is evaluated for the three cases in a model-based simulation study. To illustrate the methodology, 2017 data from the HFCS for European Union countries are used to estimate the average value of bonds at the national level. The approaches presented in Chapters 1 and 2 contribute to the literature by providing solutions for estimating SAE models in the presence of multiply imputed survey data. In particular, Chapter 2 presents a general approach that can be extended to other indicators. To obtain the best possible SAE estimator in terms of accuracy and precision, it is important to find the optimal model for the relationship between the target variable and the auxiliary data. The notion of "optimal" can be multifaceted. One way to look at optimality is to find the best transformation of the target variable to fully satisfy model assumptions or to account for nonlinearity. Another perspective is to identify the most important covariates and their relationship to each other and to the target variable. Part II of this dissertation therefore brings together research on optimal transformations and model selection in the context of SAE. Chapter 3 considers both problems simultaneously for linear mixed models (LMM) and proposes a model selection approach for LMM with data-driven transformations. In particular, the conditional Akaike information criterion is adapted by introducing the Jacobian into the criterion to allow comparison of models at different scales. The methodology is evaluated in a simulation experiment comparing different transformations with different underlying true models. Since SAE models are LMMs, this methodology is applied to the unit-level small-area method, the empirical best predictor (EBP), in an application with Mexican survey and census data (ENIGH - National Survey of Household Income and Expenditure) and shows improvements in efficiency when the optimal (linear mixed) model and the transformation parameters are found simultaneously. Chapter 3 bridges the gap between model selection and optimal transformations to satisfy normality assumptions in unit-level SAE models in particular and LMMs in general. Chapter 4 explores the problem of model selection from a different perspective and for area-level data. To model interactions between auxiliary variables and nonlinear relationships between them and the dependent variable, machine learning methods can be a versatile tool. For unit-level SAE models, mixed-effects random forests (MERFs) provide a flexible solution to account for interactions and nonlinear relationships, ensure robustness to outliers, and perform implicit model selection. In Chapter 4, the idea of MERFs is transferred to area-level models and the linear regression synthetic part of the Fay-Herriot model is replaced by a random forest to benefit from the above properties and to provide an alternative modeling approach. Chapter 4 therefore contributes to the literature by proposing a first way to combine area-level SAE models with random forests for mean estimation to allow for interactions, nonlinear relationships, and implicit variable selection. Another advantage of random forest is its non-extrapolation property, i.e. the range of predictions is limited by the lowest and highest observed values. This could help to avoid transformations at the area-level when estimating indicators defined in a fixed range. The standard Fay-Herriot model was originally developed to estimate a mean, and transformations are required when the indicator of interest is, for example, a share or a Gini coefficient. This usually requires the development of appropriate back-transformations and MSE estimators. 5 presents a Fay-Herriot model for estimating logit-transformed Gini coefficients with a bias-corrected back-transformation and a bootstrap MSE estimator. A model-based simulation is performed to show the validity of the methodology, and regionally disaggregated data from Germany are used to illustrate the proposed approach. 5 contributes to the existing literature by providing, from a frequentist perspective, an alternative to the Bayesian area-level model for estimating Gini coefficients using a logit transformation

    The Distribution of Personal Income in Post-War Italy: Source Description, Data Quality, and the Time Pattern of Income Inequality

    Get PDF
    The paper describes the sample surveys on the personal distribution of incomes conducted in post-war Italy: the first survey carried out by Istituto Doxa in 1947-48; the sample survey of household income and wealth conducted by the Bank of Italy since the late 1960s; the expenditure survey, and the European Community Household Panel conducted by the Italian Central Statistical Office, which gather income data since 1980 and 1993, respectively. The quality of the information is assessed by collecting the available evidence on differential response rates and mis-reporting, and by comparing grossed-up survey results with aggregate figures from the labour force survey and the national accounts. The evidence from income sample surveys is tentatively used to identify the main episodes in the post-war evolution of income inequality.personal income distribution, household surveys
    • 

    corecore