7,020 research outputs found

    Analyzing two-stage experiments in the presence of interference

    Full text link
    Two-stage randomization is a powerful design for estimating treatment effects in the presence of interference; that is, when one individual's treatment assignment affects another individual's outcomes. Our motivating example is a two-stage randomized trial evaluating an intervention to reduce student absenteeism in the School District of Philadelphia. In that experiment, households with multiple students were first assigned to treatment or control; then, in treated households, one student was randomly assigned to treatment. Using this example, we highlight key considerations for analyzing two-stage experiments in practice. Our first contribution is to address additional complexities that arise when household sizes vary; in this case, researchers must decide between assigning equal weight to households or equal weight to individuals. We propose unbiased estimators for a broad class of individual- and household-weighted estimands, with corresponding theoretical and estimated variances. Our second contribution is to connect two common approaches for analyzing two-stage designs: linear regression and randomization inference. We show that, with suitably chosen standard errors, these two approaches yield identical point and variance estimates, which is somewhat surprising given the complex randomization scheme. Finally, we explore options for incorporating covariates to improve precision. We confirm our analytic results via simulation studies and apply these methods to the attendance study, finding substantively meaningful spillover effects.Comment: Accepted for publication in the Journal of the American Statistical Associatio

    Regression-based Monte Carlo methods with optimal control variates

    Get PDF
    In der vorliegenden Dissertation werden regressionsbasierte Monte-Carlo-Verfahren für diskretisierte Diffusionsprozesse vorgestellt. Diese Verfahren beinhalten die Konstruktion von geeigneten Kontrollvariaten, die zu einer signifikanten Reduktion der Varianz führen. Dadurch kann die Komplexität des Standard-Monte-Carlo-Ansatzes (epsilon^{-3} für Schemen erster Ordnung und epsilon^{-2.5} für Schemen zweiter Ordnung) im besten Fall reduziert werden auf eine Ordnung von epsilon^{-2+delta} für ein beliebiges delta aus [0,0.25), wobei epsilon die zu erzielende Genauigkeit bezeichnet. In der Komplexitätsanalyse werden sowohl die Fehler, die auch beim Standard-Monte-Carlo-Ansatz auftreten (Diskretisierungs- und statistischer Fehler), als auch die aus der Schätzung bedingter Erwartungswerte mittels Regression resultierenden Fehler berücksichtigt. Darüber hinaus werden verschiedene Algorithmen hergeleitet, die zwar zu einer ähnlichen theoretischen Komplexität führen, jedoch numerisch gesehen bei der Regressionsschätzung unterschiedlich stabil und genau sind. Die Effektivität dieser Algorithmen wird anhand von numerischen Beispielen veranschaulicht und mit anderen bekannten Methoden verglichen. Zudem werden geeignete Kontrollvariaten für die Bewertung von Bermuda-Optionen sowie amerikanischen Optionen basierend auf einer dualen Monte-Carlo-Methode hergeleitet. Auch hierbei ergibt sich eine signifikante Komplexitätsreduktion, sofern die zugrunde liegenden Funktionen gewisse Glattheitsannahmen erfüllen

    Hybrid PDE solver for data-driven problems and modern branching

    Full text link
    The numerical solution of large-scale PDEs, such as those occurring in data-driven applications, unavoidably require powerful parallel computers and tailored parallel algorithms to make the best possible use of them. In fact, considerations about the parallelization and scalability of realistic problems are often critical enough to warrant acknowledgement in the modelling phase. The purpose of this paper is to spread awareness of the Probabilistic Domain Decomposition (PDD) method, a fresh approach to the parallelization of PDEs with excellent scalability properties. The idea exploits the stochastic representation of the PDE and its approximation via Monte Carlo in combination with deterministic high-performance PDE solvers. We describe the ingredients of PDD and its applicability in the scope of data science. In particular, we highlight recent advances in stochastic representations for nonlinear PDEs using branching diffusions, which have significantly broadened the scope of PDD. We envision this work as a dictionary giving large-scale PDE practitioners references on the very latest algorithms and techniques of a non-standard, yet highly parallelizable, methodology at the interface of deterministic and probabilistic numerical methods. We close this work with an invitation to the fully nonlinear case and open research questions.Comment: 23 pages, 7 figures; Final SMUR version; To appear in the European Journal of Applied Mathematics (EJAM

    Construction of weakly CUD sequences for MCMC sampling

    Full text link
    In Markov chain Monte Carlo (MCMC) sampling considerable thought goes into constructing random transitions. But those transitions are almost always driven by a simulated IID sequence. Recently it has been shown that replacing an IID sequence by a weakly completely uniformly distributed (WCUD) sequence leads to consistent estimation in finite state spaces. Unfortunately, few WCUD sequences are known. This paper gives general methods for proving that a sequence is WCUD, shows that some specific sequences are WCUD, and shows that certain operations on WCUD sequences yield new WCUD sequences. A numerical example on a 42 dimensional continuous Gibbs sampler found that some WCUD inputs sequences produced variance reductions ranging from tens to hundreds for posterior means of the parameters, compared to IID inputs.Comment: Published in at http://dx.doi.org/10.1214/07-EJS162 the Electronic Journal of Statistics (http://www.i-journals.org/ejs/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Subsampling MCMC - An introduction for the survey statistician

    Full text link
    The rapid development of computing power and efficient Markov Chain Monte Carlo (MCMC) simulation algorithms have revolutionized Bayesian statistics, making it a highly practical inference method in applied work. However, MCMC algorithms tend to be computationally demanding, and are particularly slow for large datasets. Data subsampling has recently been suggested as a way to make MCMC methods scalable on massively large data, utilizing efficient sampling schemes and estimators from the survey sampling literature. These developments tend to be unknown by many survey statisticians who traditionally work with non-Bayesian methods, and rarely use MCMC. Our article explains the idea of data subsampling in MCMC by reviewing one strand of work, Subsampling MCMC, a so called pseudo-marginal MCMC approach to speeding up MCMC through data subsampling. The review is written for a survey statistician without previous knowledge of MCMC methods since our aim is to motivate survey sampling experts to contribute to the growing Subsampling MCMC literature.Comment: Accepted for publication in Sankhya A. Previous uploaded version contained a bug in generating the figures and reference

    Stratification Trees for Adaptive Randomization in Randomized Controlled Trials

    Full text link
    This paper proposes an adaptive randomization procedure for two-stage randomized controlled trials. The method uses data from a first-wave experiment in order to determine how to stratify in a second wave of the experiment, where the objective is to minimize the variance of an estimator for the average treatment effect (ATE). We consider selection from a class of stratified randomization procedures which we call stratification trees: these are procedures whose strata can be represented as decision trees, with differing treatment assignment probabilities across strata. By using the first wave to estimate a stratification tree, we simultaneously select which covariates to use for stratification, how to stratify over these covariates, as well as the assignment probabilities within these strata. Our main result shows that using this randomization procedure with an appropriate estimator results in an asymptotic variance which is minimal in the class of stratification trees. Moreover, the results we present are able to accommodate a large class of assignment mechanisms within strata, including stratified block randomization. In a simulation study, we find that our method, paired with an appropriate cross-validation procedure ,can improve on ad-hoc choices of stratification. We conclude by applying our method to the study in Karlan and Wood (2017), where we estimate stratification trees using the first wave of their experiment
    • …
    corecore