Causal discovery, the inference of causal relations from data, is a core task
of fundamental importance in all scientific domains, and several new machine
learning methods for addressing the causal discovery problem have been proposed
recently. However, existing machine learning methods for causal discovery
typically require that the data used for inference is pooled and available in a
centralized location. In many domains of high practical importance, such as in
healthcare, data is only available at local data-generating entities (e.g.
hospitals in the healthcare context), and cannot be shared across entities due
to, among others, privacy and regulatory reasons. In this work, we address the
problem of inferring causal structure - in the form of a directed acyclic graph
(DAG) - from a distributed data set that contains both observational and
interventional data in a privacy-preserving manner by exchanging updates
instead of samples. To this end, we introduce a new federated framework,
FED-CD, that enables the discovery of global causal structures both when the
set of intervened covariates is the same across decentralized entities, and
when the set of intervened covariates are potentially disjoint. We perform a
comprehensive experimental evaluation on synthetic data that demonstrates that
FED-CD enables effective aggregation of decentralized data for causal discovery
without direct sample sharing, even when the contributing distributed data sets
cover disjoint sets of interventions. Effective methods for causal discovery in
distributed data sets could significantly advance scientific discovery and
knowledge sharing in important settings, for instance, healthcare, in which
sharing of data across local sites is difficult or prohibited