Search CORE

78 research outputs found

Lasso adjustments of treatment effect estimates in randomized experiments

Author: Bloniarz Adam
Liu Hanzhong
Sekhon Jasjeet
Yu Bin
Zhang Cun-Hui
Publication venue
Publication date: 20/12/2015
Field of study

We provide a principled way for investigators to analyze randomized experiments when the number of covariates is large. Investigators often use linear multivariate regression to analyze randomized experiments instead of simply reporting the difference of means between treatment and control groups. Their aim is to reduce the variance of the estimated treatment effect by adjusting for covariates. If there are a large number of covariates relative to the number of observations, regression may perform poorly because of overfitting. In such cases, the Lasso may be helpful. We study the resulting Lasso-based treatment effect estimator under the Neyman-Rubin model of randomized experiments. We present theoretical conditions that guarantee that the estimator is more efficient than the simple difference-of-means estimator, and we provide a conservative estimator of the asymptotic variance, which can yield tighter confidence intervals than the difference-of-means estimator. Simulation and data examples show that Lasso-based adjustment can be advantageous even when the number of covariates is less than the number of observations. Specifically, a variant using Lasso for selection and OLS for estimation performs particularly well, and it chooses a smoothing parameter based on combined performance of Lasso and OLS

arXiv.org e-Print Archive

CiteSeerX

Crossref

PubMed Central

eScholarship - University of California

Seasonal changes in the concentrations of dissolved oxygen in the lakes of the “Bory Tucholskie” National Park

Author: Bloniarz Wojciech
Marszelewski Włodzimierz
Pestka Jacek
Publication venue: Polish Limnological Society
Publication date: 01/01/2006
Field of study

The article presents the results of the examinations of dissolved oxygen vertical distribution (DO) in the deepest places of the lakes conducted at different times in the years 2003-2005, and even earlier. The authors draw particular attention to severe oxygen deficits in the deepest places of the lakes, both those deep and shallow lakes despite the fact that they are not so exposed to anthropopressure. They also point out to the similarity of the course of oxygen curves in the same lakes and seasons in consecutive years, and also differences between particular lakes. They have also determined minor correlation between the mean concentration of DO in the vertical distribution and the duration of period with the ice cover (R2=0.78)

Repository of Nicolaus Copernicus University

DP-TBART: A Transformer-based Autoregressive Model for Differentially Private Tabular Data Generation

Author: Bloniarz Brian
Castellon Rodrigo
Gopal Achintya
Rosenberg David
Publication venue
Publication date: 19/07/2023
Field of study

The generation of synthetic tabular data that preserves differential privacy is a problem of growing importance. While traditional marginal-based methods have achieved impressive results, recent work has shown that deep learning-based approaches tend to lag behind. In this work, we present Differentially-Private TaBular AutoRegressive Transformer (DP-TBART), a transformer-based autoregressive model that maintains differential privacy and achieves performance competitive with marginal-based methods on a wide variety of datasets, capable of even outperforming state-of-the-art methods in certain settings. We also provide a theoretical framework for understanding the limitations of marginal-based approaches and where deep learning-based approaches stand to contribute most. These results suggest that deep learning-based techniques should be considered as a viable alternative to marginal-based methods in the generation of differentially private synthetic tabular data

arXiv.org e-Print Archive

Polynomial Time Uniform Word Problems

Author: Berman
Bloniarz
Burmeister
Burris
Cosmadakis
Evans
Freese
Freese
Kharlampovich
Kozen
Mayr
McKinsey
Skolem
Publication venue: 'Wiley'
Publication date: 01/01/1995
Field of study

Crossref

Data reliability in citizen science: learning curve and the effects of training method, volunteer background and experience on identification accuracy of insects visiting ivy flowers

Author: Bloniarz D.V.
Brandon A.
Frankie G.W.
Genet K.S.
Howarth B.
Mumby P.J.
Pinheiro J.
R Core Team
Publication venue: 'Wiley'
Publication date: 17/05/2016
Field of study

• Citizen science, the involvement of volunteers in collecting of scientific data, can be a useful research tool. However, data collected by volunteers are often of lower quality than that collected by professional scientists. • We studied the accuracy with which volunteers identified insects visiting ivy (Hedera) flowers in Sussex, England. In the first experiment, we examined the effects of training method, volunteer background and prior experience. Fifty-three participants were trained for the same duration using one of three different methods (pamphlet, pamphlet + slide show, pamphlet + direct training). Almost immediately following training, we tested the ability of participants to identify live insects on ivy flowers to one of 10 taxonomic categories and recorded whether their identifications were correct or incorrect, without providing feedback. • The results showed that the type of training method had a significant effect on identification accuracy (P = 0.008). Participants identified 79.1% of insects correctly after using a one-page colour pamphlet, 85.6% correctly after using the pamphlet and viewing a slide show, and 94.3% correctly after using the pamphlet in combination with direct training in the field. • As direct training cannot be delivered remotely, in the following year we conducted a second experiment, in which a different sample of 26 volunteers received the pamphlet plus slide show training repeatedly three times. Moreover, in this experiment participants received c. 2 minutes of additional training material, either videos of insects or stills taken from the videos. Testing showed that identification accuracy increased from 88.6% to 91.3% to 97.5% across the three successive tests. We also found a borderline significant interaction between the type of additional material and the test number (P = 0.053), such that the video gave fewer errors than stills in the first two tests only. • The most common errors made by volunteers were misidentifications of honey bees and social wasps with their hover fly mimics. We also tested six experts who achieved nearly perfect accuracy (99.8%), which shows what is possible in practice. • Overall, our study shows that two or three sessions of remote training can be as good as one of direct training, even for relatively challenging taxonomic discriminations that include distinguishing models and mimics

Crossref

ZENODO

Electronic Archiving System

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Sussex Research Online

On mitigating the analytical limitations of finely stratified experiments

Author: Abadie
Aronow
Ball
Berger
Berk
Bloniarz
Breidt
Cochran
Cox
Cox
Ding
Ding
Fisher
Freedman
Friedman
Gadbury
Greevy
Hansen
Hansen
Higgins
Hoaglin
Imai
Imai
Imbens
Imbens
Imbens
Kallus
Kalton
Klar
Lin
Long
Lu
MacKinnon
Ming
Miratrix
Neyman
Rosenbaum
Rosenbaum
Rubin
Stuart
Stuart
van der Laan
Wolter
Publication venue: 'Wiley'
Publication date: 12/02/2019
Field of study

Although attractive from a theoretical perspective, finely stratified experiments such as paired designs suffer from certain analytical limitations that are not present in block-randomized experiments with multiple treated and control individuals in each block. In short, when using a weighted difference in means to estimate the sample average treatment effect, the traditional variance estimator in a paired experiment is conservative unless the pairwise average treatment effects are constant across pairs; however, in more coarsely stratified experiments, the corresponding variance estimator is unbiased if treatment effects are constant within blocks, even if they vary across blocks. Using insights from classical least squares theory, we present an improved variance estimator that is appropriate in finely stratified experiments. The variance estimator remains conservative in expectation but is asymptotically no more conservative than the classical estimator and can be considerably less conservative. The magnitude of the improvement depends on the extent to which effect heterogeneity can be explained by observed covariates. Aided by this estimator, a new test for the null hypothesis of a constant treatment effect is proposed. These findings extend to some, but not all, superpopulation models, depending on whether the covariates are viewed as fixed across samples

DSpace@MIT

Crossref

Rerandomization and Regression Adjustment

Author: Angrist
Aronow
Athey
Bloniarz
Box
Bruhn
Bugni
Casella
Cox
Cox
Cox
Cox
Ding
Duflo
Fisher
Fisher
Fogarty
Freedman
Freedman
Freedman
Freedman
Gerber
Higgins
Jiang
Ke
Kempthorne
Koch
Kohavi
Leon
Li
Li
Li
Lin
Lu
Miratrix
Moore
Moore
Morgan
Morgan
Neyman
Rosenbaum
Rosenberger
Rubin
Student
Tian
Tsiatis
Wager
Wu
Wu
Yang
Zhang
Zhou
Publication venue
Publication date: 31/12/2019
Field of study

Randomization is a basis for the statistical inference of treatment effects without strong assumptions on the outcome-generating process. Appropriately using covariates further yields more precise estimators in randomized experiments. R. A. Fisher suggested blocking on discrete covariates in the design stage or conducting analysis of covariance (ANCOVA) in the analysis stage. We can embed blocking into a wider class of experimental design called rerandomization, and extend the classical ANCOVA to more general regression adjustment. Rerandomization trumps complete randomization in the design stage, and regression adjustment trumps the simple difference-in-means estimator in the analysis stage. It is then intuitive to use both rerandomization and regression adjustment. Under the randomization-inference framework, we establish a unified theory allowing the designer and analyzer to have access to different sets of covariates. We find that asymptotically (a) for any given estimator with or without regression adjustment, rerandomization never hurts either the sampling precision or the estimated precision, and (b) for any given design with or without rerandomization, our regression-adjusted estimator never hurts the estimated precision. Therefore, combining rerandomization and regression adjustment yields better coverage properties and thus improves statistical inference. To theoretically quantify these statements, we discuss optimal regression-adjusted estimators in terms of the sampling precision and the estimated precision, and then measure the additional gains of the designer and the analyzer. We finally suggest using rerandomization in the design and regression adjustment in the analysis followed by the Huber--White robust standard error

arXiv.org e-Print Archive

Crossref

eScholarship - University of California

Average-case complexity of shortest-paths problems in the vertex-potential model

Author: A. M. Frieze
A. Moffat
A. V. Goldberg
B. V. Cherkassky
C. C. McGeoch
D. B. Johnson
D. R. Karger
E. W. Dijkstra
J. Edmonds
K. Mehlhorn
K. Noshita
L. R. Ford Jr.
M. L. Fredman
P. A. Bloniarz
P. M. Spira
R. Bellman
R. Davis
R. Hassin
R. K. Ahuja
S. G. Kolliopoulos
T. Hagerup
W. Hoeffding
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Lasso adjustments of treatment effect estimates in randomized experiments

Author: Bloniarz Adam,
Publication venue
Publication date: 02/12/2019
Field of study

Ezid