Search CORE

1,633 research outputs found

A Primer on Causality in Data Science

Author: Balzer Laura B.
Saddiki Hachem
Publication venue
Publication date: 04/03/2019
Field of study

Many questions in Data Science are fundamentally causal in that our objective is to learn the effect of some exposure, randomized or not, on an outcome interest. Even studies that are seemingly non-causal, such as those with the goal of prediction or prevalence estimation, have causal elements, including differential censoring or measurement. As a result, we, as Data Scientists, need to consider the underlying causal mechanisms that gave rise to the data, rather than simply the pattern or association observed in those data. In this work, we review the 'Causal Roadmap' of Petersen and van der Laan (2014) to provide an introduction to some key concepts in causal inference. Similar to other causal frameworks, the steps of the Roadmap include clearly stating the scientific question, defining of the causal model, translating the scientific question into a causal parameter, assessing the assumptions needed to express the causal parameter as a statistical estimand, implementation of statistical estimators including parametric and semi-parametric methods, and interpretation of our findings. We believe that using such a framework in Data Science will help to ensure that our statistical analyses are guided by the scientific question driving our research, while avoiding over-interpreting our results. We focus on the effect of an exposure occurring at a single time point and highlight the use of targeted maximum likelihood estimation (TMLE) with Super Learner.Comment: 26 pages (with references); 4 figure

arXiv.org e-Print Archive

Numérisation de Documents Anciens Mathématiques

The basic structure of neoclassical general equilibrium theory

Author: Balzer Wolfgang
Hamminga B.
Publication venue: NLD
Publication date: 22/01/2009
Field of study

The macro theory of general equilibrium of a closed economy (GECE) is concisely formulated. Seen from two or more countries, the following notions used in the models are: kinds of goods and factors, output of goods, input of factors, price of goods and factors, endowment-, production-, and utility-function. The last three notions are treated as GECE-theoretical. The central hypothese of GECE using all the notions and several definitions, says that in all intended systems there exist equilibrium states. Here the inadequate distinction between GECE-theoretical and non-GECE-theoretical notions is used. But the formal apparatus constructed here can be used also in the more adequate meta-theoretical approach

SSOAR - Social Science Open Access Repository

Non-equilibrium Green's function approach to inhomogeneous quantum many-body systems using the Generalized Kadanoff Baym Ansatz

Author: Balzer K
Bonitz M
Bonitz M
Born M
Fetter A L
Haug H
K Balzer
Kadanoff L
Keldysh L
M Bonitz
S Hermanns
Velický B
Publication venue: 'IOP Publishing'
Publication date: 20/05/2012
Field of study

In non-equilibrium Green's function calculations the use of the Generalized Kadanoff-Baym Ansatz (GKBA) allows for a simple approximate reconstruction of the two-time Green's function from its time-diagonal value. With this a drastic reduction of the computational needs is achieved in time-dependent calculations, making longer time propagation possible and more complex systems accessible. This paper gives credit to the GKBA that was introduced 25 years ago. After a detailed derivation of the GKBA, we recall its application to homogeneous systems and show how to extend it to strongly correlated, inhomogeneous systems. As a proof of concept, we present results for a 2-electron quantum well, where the correct treatment of the correlated electron dynamics is crucial for the correct description of the equilibrium and dynamic properties

arXiv.org e-Print Archive

Crossref

A new approach to hierarchical data analysis: Targeted maximum likelihood estimation for the causal effect of a cluster-level exposure

Author: Balzer Laura B.
Petersen Maya L.
van der Laan Mark J.
Zheng Wenjing
Publication venue
Publication date: 02/04/2018
Field of study

We often seek to estimate the impact of an exposure naturally occurring or randomly assigned at the cluster-level. For example, the literature on neighborhood determinants of health continues to grow. Likewise, community randomized trials are applied to learn about real-world implementation, sustainability, and population effects of interventions with proven individual-level efficacy. In these settings, individual-level outcomes are correlated due to shared cluster-level factors, including the exposure, as well as social or biological interactions between individuals. To flexibly and efficiently estimate the effect of a cluster-level exposure, we present two targeted maximum likelihood estimators (TMLEs). The first TMLE is developed under a non-parametric causal model, which allows for arbitrary interactions between individuals within a cluster. These interactions include direct transmission of the outcome (i.e. contagion) and influence of one individual's covariates on another's outcome (i.e. covariate interference). The second TMLE is developed under a causal sub-model assuming the cluster-level and individual-specific covariates are sufficient to control for confounding. Simulations compare the alternative estimators and illustrate the potential gains from pairing individual-level risk factors and outcomes during estimation, while avoiding unwarranted assumptions. Our results suggest that estimation under the sub-model can result in bias and misleading inference in an observational setting. Incorporating working assumptions during estimation is more robust than assuming they hold in the underlying causal model. We illustrate our approach with an application to HIV prevention and treatment

arXiv.org e-Print Archive

Crossref

eScholarship - University of California

Recommended from our members

Examining Obedience Training as a Physical Activity Intervention for Dog Owners: Findings from the Stealth Pet Obedience Training (SPOT) Pilot Study

Author: Balzer Laura B.
Masteller Brittany
Potter Katie
Publication venue: ScholarWorks@UMass Amherst
Publication date: 01/01/2021
Field of study

Dog training may strengthen the dog–owner bond, a consistent predictor of dog walking behavior. The Stealth Pet Obedience Training (SPOT) study piloted dog training as a stealth physical activity (PA) intervention. In this study, 41 dog owners who reported dog walking ≤3 days/week were randomized to a six-week basic obedience training class or waitlist control. Participants wore accelerometers and logged dog walking at baseline, 6- and 12-weeks. Changes in PA and dog walking were compared between arms with targeted maximum likelihood estimation. At baseline, participants (39 ± 12 years; females = 85%) walked their dog 1.9 days/week and took 5838 steps/day, on average. At week 6, intervention participants walked their dog 0.7 more days/week and took 480 more steps/day, on average, than at baseline, while control participants walked their dog, on average, 0.6 fewer days/week and took 300 fewer steps/day (difference between arms: 1.3 dog walking days/week; 95% CI = 0.2, 2.5; 780 steps/day, 95% CI = −746, 2307). Changes from baseline were similar at week 12 (difference between arms: 1.7 dog walking days/week; 95% CI = 0.6, 2.9; 1084 steps/day, 95% CI = −203, 2370). Given high rates of dog ownership and low rates of dog walking in the United States, this novel PA promotion strategy warrants further investigation

ScholarWorks@UMass Amherst

eScholarship - University of California

Estimating Effects on Rare Outcomes: Knowledge is Power

Author: Balzer Laura B.
van der Laan Mark J.
Publication venue: Collection of Biostatistics Research Archive
Publication date: 13/05/2013
Field of study

Many of the secondary outcomes in observational studies and randomized trials are rare. Methods for estimating causal effects and associations with rare outcomes, however, are limited, and this represents a missed opportunity for investigation. In this article, we construct a new targeted minimum loss-based estimator (TMLE) for the effect of an exposure or treatment on a rare outcome. We focus on the causal risk difference and statistical models incorporating bounds on the conditional risk of the outcome, given the exposure and covariates. By construction, the proposed estimator constrains the predicted outcomes to respect this model knowledge. Theoretically, this bounding provides stability and power to estimate the exposure effect. In finite sample simulations, the proposed estimator performed as well, if not better, than alternative estimators, including the propensity score matching estimator, inverse probability of treatment weighted (IPTW) estimator, augmented-IPTW and the standard TMLE algorithm. The new estimator remained unbiased if either the conditional mean outcome or the propensity score were consistently estimated. As a substitution estimator, TMLE guaranteed the point estimates were within the parameter range. Our results highlight the potential for double robust, semiparametric efficient estimation with rare event

Collection Of Biostatistics Research Archive

Adaptive Selection of the Optimal Strategy to Improve Precision and Power in Randomized Trials

Author: Amaranath Pracheta
Balzer Laura B.
Cai Erica
Garraza Lucas Godoy
Publication venue
Publication date: 30/05/2023
Field of study

Benkeser et al. demonstrate how adjustment for baseline covariates in randomized trials can meaningfully improve precision for a variety of outcome types. Their findings build on a long history, starting in 1932 with R.A. Fisher and including more recent endorsements by the U.S. Food and Drug Administration and the European Medicines Agency. Here, we address an important practical consideration: *how* to select the adjustment approach -- which variables and in which form -- to maximize precision, while maintaining Type-I error control. Balzer et al. previously proposed *Adaptive Prespecification* within TMLE to flexibly and automatically select, from a prespecified set, the approach that maximizes empirical efficiency in small trials (N

<

40). To avoid overfitting with few randomized units, selection was previously limited to working generalized linear models, adjusting for a single covariate. Now, we tailor Adaptive Prespecification to trials with many randomized units. Using

V

-fold cross-validation and the estimated influence curve-squared as the loss function, we select from an expanded set of candidates, including modern machine learning methods adjusting for multiple covariates. As assessed in simulations exploring a variety of data generating processes, our approach maintains Type-I error control (under the null) and offers substantial gains in precision -- equivalent to 20-43\% reductions in sample size for the same statistical power. When applied to real data from ACTG Study 175, we also see meaningful efficiency improvements overall and within subgroups.Comment: 10.5 pages of main text (including 2 tables, 2 figures) + 14.5 pages of Supporting Inf

arXiv.org e-Print Archive

Cis-regulatory elements of the mitotic regulator, string/Cdc25

Author: Balzer T.
Britton J.
Edgar B.
Johnston L.
Lehman D.
Patterson B.
Saint R.
Publication venue: 'The Company of Biologists'
Publication date: 01/01/1999
Field of study

Mitosis in most Drosophila cells is triggered by brief bursts of transcription of string (stg), a Cdc25-type phosphatase that activates the mitotic kinase, Cdk1 (Cdc2). To understand how string transcription is regulated, we analyzed the expression of string-lacZ reporter genes covering approximately 40 kb of the string locus. We also tested protein coding fragments of the string locus of 6 kb to 31.6 kb for their ability to complement loss of string function in embryos and imaginal discs. A plethora of cis-acting elements spread over >30 kb control string transcription in different cells and tissue types. Regulatory elements specific to subsets of epidermal cells, mesoderm, trachea and nurse cells were identified, but the majority of the string locus appears to be devoted to controlling cell proliferation during neurogenesis. Consistent with this, compact promotor-proximal sequences are sufficient for string function during imaginal disc growth, but additional distal elements are required for the development of neural structures in the eye, wing, leg and notum. We suggest that, during evolution, cell-type-specific control elements were acquired by a simple growth-regulated promoter as a means of coordinating cell division with developmental processes, particularly neurogenesis.Dara A. Lehman; Briony Patterson, Laura A. Johnston; Tracy Balzer; Jessica S. Britton; Robert Saint and Bruce A. Edga

Crossref

Adelaide Research & Scholarship

A Comprehensive Survey of Brane Tilings

Author: Allen J.R.
B. P. Miller
Balzer R.M.
Curtis R.
Database Operating Systems Notes
Horwitz S.
Jong-Deok Choi
Kennedy K.
Miller B.P.
Snodgrass R.
Publication venue: 'World Scientific Pub Co Pte Lt'
Publication date: 01/01/1988
Field of study

An infinite class of

4d

\mathcal{N}=1

gauge theories can be engineered on the worldvolume of D3-branes probing toric Calabi-Yau 3-folds. This kind of setup has multiple applications, ranging from the gauge/gravity correspondence to local model building in string phenomenology. Brane tilings fully encode the gauge theories on the D3-branes and have substantially simplified their connection to the probed geometries. The purpose of this paper is to push the boundaries of computation and to produce as comprehensive a database of brane tilings as possible. We develop efficient implementations of brane tiling tools particularly suited for this search. We present the first complete classification of toric Calabi-Yau 3-folds with toric diagrams up to area 8 and the corresponding brane tilings. This classification is of interest to both physicists and mathematicians alike.Comment: 39 pages. Link to Mathematica modules provide

arXiv.org e-Print Archive

City Research Online

Crossref

MINDS@UW (Univ. of Wisconsin)