1,339 research outputs found
Stratification Trees for Adaptive Randomization in Randomized Controlled Trials
This paper proposes an adaptive randomization procedure for two-stage
randomized controlled trials. The method uses data from a first-wave experiment
in order to determine how to stratify in a second wave of the experiment, where
the objective is to minimize the variance of an estimator for the average
treatment effect (ATE). We consider selection from a class of stratified
randomization procedures which we call stratification trees: these are
procedures whose strata can be represented as decision trees, with differing
treatment assignment probabilities across strata. By using the first wave to
estimate a stratification tree, we simultaneously select which covariates to
use for stratification, how to stratify over these covariates, as well as the
assignment probabilities within these strata. Our main result shows that using
this randomization procedure with an appropriate estimator results in an
asymptotic variance which is minimal in the class of stratification trees.
Moreover, the results we present are able to accommodate a large class of
assignment mechanisms within strata, including stratified block randomization.
In a simulation study, we find that our method, paired with an appropriate
cross-validation procedure ,can improve on ad-hoc choices of stratification. We
conclude by applying our method to the study in Karlan and Wood (2017), where
we estimate stratification trees using the first wave of their experiment
Horvitz-Thompson estimators for functional data: asymptotic confidence bands and optimal allocation for stratified sampling
When dealing with very large datasets of functional data, survey sampling
approaches are useful in order to obtain estimators of simple functional
quantities, without being obliged to store all the data. We propose here a
Horvitz--Thompson estimator of the mean trajectory. In the context of a
superpopulation framework, we prove under mild regularity conditions that we
obtain uniformly consistent estimators of the mean function and of its variance
function. With additional assumptions on the sampling design we state a
functional Central Limit Theorem and deduce asymptotic confidence bands.
Stratified sampling is studied in detail, and we also obtain a functional
version of the usual optimal allocation rule considering a mean variance
criterion. These techniques are illustrated by means of a test population of
N=18902 electricity meters for which we have individual electricity consumption
measures every 30 minutes over one week. We show that stratification can
substantially improve both the accuracy of the estimators and reduce the width
of the global confidence bands compared to simple random sampling without
replacement.Comment: Accepted for publication in Biometrik
Uncertainty Assessment for PA Models
A mathematical model comprises input variables, output variables and equations relating these quantities. The input variables may vary within some ranges, reflecting either our incomplete knowledge about them (epistemic uncertainty) or their intrinsic variability (aleatory uncertainty). Moreover when solving numerically the equations of the model, numerical errors are also arising. The effects of such errors and variations of the inputs have to be quantified in order to asses the modelÂżs range of validity. The goal of uncertainty analysis is to asses the effects of parameter uncertainties on the uncertainties in computed results.
The purpose of this report is to give an overview of the most useful probabilistic and statistic techniques and methods to characterize uncertainty propagation. Some examples of application of these techniques for PA applied to radioactive waste disposal are given.JRC.F.4-Safety of future nuclear reactor
Income and consumption inequality in Poland, 1998–2008
This paper estimates a variety of inequality indices to study the evolution of income and consumption inequality in Poland between 1998 and 2008. We use robust methods to adjust for the impact of extremely large observations. We also conduct statistical tests on inequality changes using methods, which account for the complexity of the household sample design. All analyses are performed for the entire population, for rural and urban subpopulations, and for the three largest cities. The main result is that during 1998–2008 there was a statistically significant rise in economic inequalities in Poland, which depending on the inequality index, ranged from 8.7% to 19.6% in case of income distribution and from 6.5% to 12.3% in case of consumption distribution. Among the studied subpopulations, economic inequalities are both the highest and the fastest-growing in Warsaw, where consumption inequality as measured by the Gini index increased during the studied period by as much as almost 23%.income inequality, consumption inequality, Pareto model, robust estimation, statistical inference, Poland
Bayesian subset simulation
We consider the problem of estimating a probability of failure ,
defined as the volume of the excursion set of a function above a given threshold, under a given
probability measure on . In this article, we combine the popular
subset simulation algorithm (Au and Beck, Probab. Eng. Mech. 2001) and our
sequential Bayesian approach for the estimation of a probability of failure
(Bect, Ginsbourger, Li, Picheny and Vazquez, Stat. Comput. 2012). This makes it
possible to estimate when the number of evaluations of is very
limited and is very small. The resulting algorithm is called Bayesian
subset simulation (BSS). A key idea, as in the subset simulation algorithm, is
to estimate the probabilities of a sequence of excursion sets of above
intermediate thresholds, using a sequential Monte Carlo (SMC) approach. A
Gaussian process prior on is used to define the sequence of densities
targeted by the SMC algorithm, and drive the selection of evaluation points of
to estimate the intermediate probabilities. Adaptive procedures are
proposed to determine the intermediate thresholds and the number of evaluations
to be carried out at each stage of the algorithm. Numerical experiments
illustrate that BSS achieves significant savings in the number of function
evaluations with respect to other Monte Carlo approaches
Imputation under informative sampling
Imputed values in surveys are often generated under the assumption that the sampling mechanism is non-informative (or ignorable) and the study variable is missing at random (MAR). When the sampling design is informative, the assumption of MAR in the population does not necessarily imply MAR in the sample. In this case, the classical method of imputation using a model fitted to the sample data does not in general lead to unbiased estimation. To overcome this problem, we consider alternative approaches to imputation assuming MAR in the population. We compare the alternative imputation procedures through simulation and an application to estimation of mean erosion using data from the Conservation Effects Assessment Project
- …