7,020 research outputs found
Analyzing two-stage experiments in the presence of interference
Two-stage randomization is a powerful design for estimating treatment effects
in the presence of interference; that is, when one individual's treatment
assignment affects another individual's outcomes. Our motivating example is a
two-stage randomized trial evaluating an intervention to reduce student
absenteeism in the School District of Philadelphia. In that experiment,
households with multiple students were first assigned to treatment or control;
then, in treated households, one student was randomly assigned to treatment.
Using this example, we highlight key considerations for analyzing two-stage
experiments in practice. Our first contribution is to address additional
complexities that arise when household sizes vary; in this case, researchers
must decide between assigning equal weight to households or equal weight to
individuals. We propose unbiased estimators for a broad class of individual-
and household-weighted estimands, with corresponding theoretical and estimated
variances. Our second contribution is to connect two common approaches for
analyzing two-stage designs: linear regression and randomization inference. We
show that, with suitably chosen standard errors, these two approaches yield
identical point and variance estimates, which is somewhat surprising given the
complex randomization scheme. Finally, we explore options for incorporating
covariates to improve precision. We confirm our analytic results via simulation
studies and apply these methods to the attendance study, finding substantively
meaningful spillover effects.Comment: Accepted for publication in the Journal of the American Statistical
Associatio
Regression-based Monte Carlo methods with optimal control variates
In der vorliegenden Dissertation werden regressionsbasierte Monte-Carlo-Verfahren für diskretisierte Diffusionsprozesse vorgestellt. Diese Verfahren beinhalten die Konstruktion von geeigneten Kontrollvariaten, die zu einer signifikanten Reduktion der Varianz führen. Dadurch kann die Komplexität des Standard-Monte-Carlo-Ansatzes (epsilon^{-3} für Schemen erster Ordnung und epsilon^{-2.5} für Schemen zweiter Ordnung) im besten Fall reduziert werden auf eine Ordnung von epsilon^{-2+delta} für ein beliebiges delta aus [0,0.25), wobei epsilon die zu erzielende Genauigkeit bezeichnet. In der Komplexitätsanalyse werden sowohl die Fehler, die auch beim Standard-Monte-Carlo-Ansatz auftreten (Diskretisierungs- und statistischer Fehler), als auch die aus der Schätzung bedingter Erwartungswerte mittels Regression resultierenden Fehler berücksichtigt. Darüber hinaus werden verschiedene Algorithmen hergeleitet, die zwar zu einer ähnlichen theoretischen Komplexität führen, jedoch numerisch gesehen bei der Regressionsschätzung unterschiedlich stabil und genau sind. Die Effektivität dieser Algorithmen wird anhand von numerischen Beispielen veranschaulicht und mit anderen bekannten Methoden verglichen. Zudem werden geeignete Kontrollvariaten für die Bewertung von Bermuda-Optionen sowie amerikanischen Optionen basierend auf einer dualen Monte-Carlo-Methode hergeleitet. Auch hierbei ergibt sich eine signifikante Komplexitätsreduktion, sofern die zugrunde liegenden Funktionen gewisse Glattheitsannahmen erfüllen
Hybrid PDE solver for data-driven problems and modern branching
The numerical solution of large-scale PDEs, such as those occurring in
data-driven applications, unavoidably require powerful parallel computers and
tailored parallel algorithms to make the best possible use of them. In fact,
considerations about the parallelization and scalability of realistic problems
are often critical enough to warrant acknowledgement in the modelling phase.
The purpose of this paper is to spread awareness of the Probabilistic Domain
Decomposition (PDD) method, a fresh approach to the parallelization of PDEs
with excellent scalability properties. The idea exploits the stochastic
representation of the PDE and its approximation via Monte Carlo in combination
with deterministic high-performance PDE solvers. We describe the ingredients of
PDD and its applicability in the scope of data science. In particular, we
highlight recent advances in stochastic representations for nonlinear PDEs
using branching diffusions, which have significantly broadened the scope of
PDD.
We envision this work as a dictionary giving large-scale PDE practitioners
references on the very latest algorithms and techniques of a non-standard, yet
highly parallelizable, methodology at the interface of deterministic and
probabilistic numerical methods. We close this work with an invitation to the
fully nonlinear case and open research questions.Comment: 23 pages, 7 figures; Final SMUR version; To appear in the European
Journal of Applied Mathematics (EJAM
Construction of weakly CUD sequences for MCMC sampling
In Markov chain Monte Carlo (MCMC) sampling considerable thought goes into
constructing random transitions. But those transitions are almost always driven
by a simulated IID sequence. Recently it has been shown that replacing an IID
sequence by a weakly completely uniformly distributed (WCUD) sequence leads to
consistent estimation in finite state spaces. Unfortunately, few WCUD sequences
are known. This paper gives general methods for proving that a sequence is
WCUD, shows that some specific sequences are WCUD, and shows that certain
operations on WCUD sequences yield new WCUD sequences. A numerical example on a
42 dimensional continuous Gibbs sampler found that some WCUD inputs sequences
produced variance reductions ranging from tens to hundreds for posterior means
of the parameters, compared to IID inputs.Comment: Published in at http://dx.doi.org/10.1214/07-EJS162 the Electronic
Journal of Statistics (http://www.i-journals.org/ejs/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Subsampling MCMC - An introduction for the survey statistician
The rapid development of computing power and efficient Markov Chain Monte
Carlo (MCMC) simulation algorithms have revolutionized Bayesian statistics,
making it a highly practical inference method in applied work. However, MCMC
algorithms tend to be computationally demanding, and are particularly slow for
large datasets. Data subsampling has recently been suggested as a way to make
MCMC methods scalable on massively large data, utilizing efficient sampling
schemes and estimators from the survey sampling literature. These developments
tend to be unknown by many survey statisticians who traditionally work with
non-Bayesian methods, and rarely use MCMC. Our article explains the idea of
data subsampling in MCMC by reviewing one strand of work, Subsampling MCMC, a
so called pseudo-marginal MCMC approach to speeding up MCMC through data
subsampling. The review is written for a survey statistician without previous
knowledge of MCMC methods since our aim is to motivate survey sampling experts
to contribute to the growing Subsampling MCMC literature.Comment: Accepted for publication in Sankhya A. Previous uploaded version
contained a bug in generating the figures and reference
Stratification Trees for Adaptive Randomization in Randomized Controlled Trials
This paper proposes an adaptive randomization procedure for two-stage
randomized controlled trials. The method uses data from a first-wave experiment
in order to determine how to stratify in a second wave of the experiment, where
the objective is to minimize the variance of an estimator for the average
treatment effect (ATE). We consider selection from a class of stratified
randomization procedures which we call stratification trees: these are
procedures whose strata can be represented as decision trees, with differing
treatment assignment probabilities across strata. By using the first wave to
estimate a stratification tree, we simultaneously select which covariates to
use for stratification, how to stratify over these covariates, as well as the
assignment probabilities within these strata. Our main result shows that using
this randomization procedure with an appropriate estimator results in an
asymptotic variance which is minimal in the class of stratification trees.
Moreover, the results we present are able to accommodate a large class of
assignment mechanisms within strata, including stratified block randomization.
In a simulation study, we find that our method, paired with an appropriate
cross-validation procedure ,can improve on ad-hoc choices of stratification. We
conclude by applying our method to the study in Karlan and Wood (2017), where
we estimate stratification trees using the first wave of their experiment
- …