447,360 research outputs found
Sequences of regressions and their independences
Ordered sequences of univariate or multivariate regressions provide
statistical models for analysing data from randomized, possibly sequential
interventions, from cohort or multi-wave panel studies, but also from
cross-sectional or retrospective studies. Conditional independences are
captured by what we name regression graphs, provided the generated distribution
shares some properties with a joint Gaussian distribution. Regression graphs
extend purely directed, acyclic graphs by two types of undirected graph, one
type for components of joint responses and the other for components of the
context vector variable. We review the special features and the history of
regression graphs, derive criteria to read all implied independences of a
regression graph and prove criteria for Markov equivalence that is to judge
whether two different graphs imply the same set of independence statements.
Knowledge of Markov equivalence provides alternative interpretations of a given
sequence of regressions, is essential for machine learning strategies and
permits to use the simple graphical criteria of regression graphs on graphs for
which the corresponding criteria are in general more complex. Under the known
conditions that a Markov equivalent directed acyclic graph exists for any given
regression graph, we give a polynomial time algorithm to find one such graph.Comment: 43 pages with 17 figures The manuscript is to appear as an invited
discussion paper in the journal TES
Causal Discovery with Continuous Additive Noise Models
We consider the problem of learning causal directed acyclic graphs from an
observational joint distribution. One can use these graphs to predict the
outcome of interventional experiments, from which data are often not available.
We show that if the observational distribution follows a structural equation
model with an additive noise structure, the directed acyclic graph becomes
identifiable from the distribution under mild conditions. This constitutes an
interesting alternative to traditional methods that assume faithfulness and
identify only the Markov equivalence class of the graph, thus leaving some
edges undirected. We provide practical algorithms for finitely many samples,
RESIT (Regression with Subsequent Independence Test) and two methods based on
an independence score. We prove that RESIT is correct in the population setting
and provide an empirical evaluation
Cluster-Robust Variance Estimation for Dyadic Data
Dyadic data are common in the social sciences, although inference for such
settings involves accounting for a complex clustering structure. Many analyses
in the social sciences fail to account for the fact that multiple dyads share a
member, and that errors are thus likely correlated across these dyads. We
propose a nonparametric sandwich-type robust variance estimator for linear
regression to account for such clustering in dyadic data. We enumerate
conditions for estimator consistency. We also extend our results to repeated
and weighted observations, including directed dyads and longitudinal data, and
provide an implementation for generalized linear models such as logistic
regression. We examine empirical performance with simulations and applications
to international relations and speed dating
Marginal integration for nonparametric causal inference
We consider the problem of inferring the total causal effect of a single
variable intervention on a (response) variable of interest. We propose a
certain marginal integration regression technique for a very general class of
potentially nonlinear structural equation models (SEMs) with known structure,
or at least known superset of adjustment variables: we call the procedure
S-mint regression. We easily derive that it achieves the convergence rate as
for nonparametric regression: for example, single variable intervention effects
can be estimated with convergence rate assuming smoothness with
twice differentiable functions. Our result can also be seen as a major
robustness property with respect to model misspecification which goes much
beyond the notion of double robustness. Furthermore, when the structure of the
SEM is not known, we can estimate (the equivalence class of) the directed
acyclic graph corresponding to the SEM, and then proceed by using S-mint based
on these estimates. We empirically compare the S-mint regression method with
more classical approaches and argue that the former is indeed more robust, more
reliable and substantially simpler.Comment: 40 pages, 14 figure
- …