1,339 research outputs found

    Stratification Trees for Adaptive Randomization in Randomized Controlled Trials

    Full text link
    This paper proposes an adaptive randomization procedure for two-stage randomized controlled trials. The method uses data from a first-wave experiment in order to determine how to stratify in a second wave of the experiment, where the objective is to minimize the variance of an estimator for the average treatment effect (ATE). We consider selection from a class of stratified randomization procedures which we call stratification trees: these are procedures whose strata can be represented as decision trees, with differing treatment assignment probabilities across strata. By using the first wave to estimate a stratification tree, we simultaneously select which covariates to use for stratification, how to stratify over these covariates, as well as the assignment probabilities within these strata. Our main result shows that using this randomization procedure with an appropriate estimator results in an asymptotic variance which is minimal in the class of stratification trees. Moreover, the results we present are able to accommodate a large class of assignment mechanisms within strata, including stratified block randomization. In a simulation study, we find that our method, paired with an appropriate cross-validation procedure ,can improve on ad-hoc choices of stratification. We conclude by applying our method to the study in Karlan and Wood (2017), where we estimate stratification trees using the first wave of their experiment

    Horvitz-Thompson estimators for functional data: asymptotic confidence bands and optimal allocation for stratified sampling

    Full text link
    When dealing with very large datasets of functional data, survey sampling approaches are useful in order to obtain estimators of simple functional quantities, without being obliged to store all the data. We propose here a Horvitz--Thompson estimator of the mean trajectory. In the context of a superpopulation framework, we prove under mild regularity conditions that we obtain uniformly consistent estimators of the mean function and of its variance function. With additional assumptions on the sampling design we state a functional Central Limit Theorem and deduce asymptotic confidence bands. Stratified sampling is studied in detail, and we also obtain a functional version of the usual optimal allocation rule considering a mean variance criterion. These techniques are illustrated by means of a test population of N=18902 electricity meters for which we have individual electricity consumption measures every 30 minutes over one week. We show that stratification can substantially improve both the accuracy of the estimators and reduce the width of the global confidence bands compared to simple random sampling without replacement.Comment: Accepted for publication in Biometrik

    Uncertainty Assessment for PA Models

    Get PDF
    A mathematical model comprises input variables, output variables and equations relating these quantities. The input variables may vary within some ranges, reflecting either our incomplete knowledge about them (epistemic uncertainty) or their intrinsic variability (aleatory uncertainty). Moreover when solving numerically the equations of the model, numerical errors are also arising. The effects of such errors and variations of the inputs have to be quantified in order to asses the modelÂżs range of validity. The goal of uncertainty analysis is to asses the effects of parameter uncertainties on the uncertainties in computed results. The purpose of this report is to give an overview of the most useful probabilistic and statistic techniques and methods to characterize uncertainty propagation. Some examples of application of these techniques for PA applied to radioactive waste disposal are given.JRC.F.4-Safety of future nuclear reactor

    Income and consumption inequality in Poland, 1998–2008

    Get PDF
    This paper estimates a variety of inequality indices to study the evolution of income and consumption inequality in Poland between 1998 and 2008. We use robust methods to adjust for the impact of extremely large observations. We also conduct statistical tests on inequality changes using methods, which account for the complexity of the household sample design. All analyses are performed for the entire population, for rural and urban subpopulations, and for the three largest cities. The main result is that during 1998–2008 there was a statistically significant rise in economic inequalities in Poland, which depending on the inequality index, ranged from 8.7% to 19.6% in case of income distribution and from 6.5% to 12.3% in case of consumption distribution. Among the studied subpopulations, economic inequalities are both the highest and the fastest-growing in Warsaw, where consumption inequality as measured by the Gini index increased during the studied period by as much as almost 23%.income inequality, consumption inequality, Pareto model, robust estimation, statistical inference, Poland

    Bayesian subset simulation

    Full text link
    We consider the problem of estimating a probability of failure α\alpha, defined as the volume of the excursion set of a function f:X⊆Rd→Rf:\mathbb{X} \subseteq \mathbb{R}^{d} \to \mathbb{R} above a given threshold, under a given probability measure on X\mathbb{X}. In this article, we combine the popular subset simulation algorithm (Au and Beck, Probab. Eng. Mech. 2001) and our sequential Bayesian approach for the estimation of a probability of failure (Bect, Ginsbourger, Li, Picheny and Vazquez, Stat. Comput. 2012). This makes it possible to estimate α\alpha when the number of evaluations of ff is very limited and α\alpha is very small. The resulting algorithm is called Bayesian subset simulation (BSS). A key idea, as in the subset simulation algorithm, is to estimate the probabilities of a sequence of excursion sets of ff above intermediate thresholds, using a sequential Monte Carlo (SMC) approach. A Gaussian process prior on ff is used to define the sequence of densities targeted by the SMC algorithm, and drive the selection of evaluation points of ff to estimate the intermediate probabilities. Adaptive procedures are proposed to determine the intermediate thresholds and the number of evaluations to be carried out at each stage of the algorithm. Numerical experiments illustrate that BSS achieves significant savings in the number of function evaluations with respect to other Monte Carlo approaches

    Imputation under informative sampling

    Get PDF
    Imputed values in surveys are often generated under the assumption that the sampling mechanism is non-informative (or ignorable) and the study variable is missing at random (MAR). When the sampling design is informative, the assumption of MAR in the population does not necessarily imply MAR in the sample. In this case, the classical method of imputation using a model fitted to the sample data does not in general lead to unbiased estimation. To overcome this problem, we consider alternative approaches to imputation assuming MAR in the population. We compare the alternative imputation procedures through simulation and an application to estimation of mean erosion using data from the Conservation Effects Assessment Project
    • …
    corecore