1,762 research outputs found
Learning Large-Scale Bayesian Networks with the sparsebn Package
Learning graphical models from data is an important problem with wide
applications, ranging from genomics to the social sciences. Nowadays datasets
often have upwards of thousands---sometimes tens or hundreds of thousands---of
variables and far fewer samples. To meet this challenge, we have developed a
new R package called sparsebn for learning the structure of large, sparse
graphical models with a focus on Bayesian networks. While there are many
existing software packages for this task, this package focuses on the unique
setting of learning large networks from high-dimensional data, possibly with
interventions. As such, the methods provided place a premium on scalability and
consistency in a high-dimensional setting. Furthermore, in the presence of
interventions, the methods implemented here achieve the goal of learning a
causal network from data. Additionally, the sparsebn package is fully
compatible with existing software packages for network analysis.Comment: To appear in the Journal of Statistical Software, 39 pages, 7 figure
Sequences of regressions and their independences
Ordered sequences of univariate or multivariate regressions provide
statistical models for analysing data from randomized, possibly sequential
interventions, from cohort or multi-wave panel studies, but also from
cross-sectional or retrospective studies. Conditional independences are
captured by what we name regression graphs, provided the generated distribution
shares some properties with a joint Gaussian distribution. Regression graphs
extend purely directed, acyclic graphs by two types of undirected graph, one
type for components of joint responses and the other for components of the
context vector variable. We review the special features and the history of
regression graphs, derive criteria to read all implied independences of a
regression graph and prove criteria for Markov equivalence that is to judge
whether two different graphs imply the same set of independence statements.
Knowledge of Markov equivalence provides alternative interpretations of a given
sequence of regressions, is essential for machine learning strategies and
permits to use the simple graphical criteria of regression graphs on graphs for
which the corresponding criteria are in general more complex. Under the known
conditions that a Markov equivalent directed acyclic graph exists for any given
regression graph, we give a polynomial time algorithm to find one such graph.Comment: 43 pages with 17 figures The manuscript is to appear as an invited
discussion paper in the journal TES
Graphical continuous Lyapunov models
The linear Lyapunov equation of a covariance matrix parametrizes the
equilibrium covariance matrix of a stochastic process. This parametrization can
be interpreted as a new graphical model class, and we show how the model class
behaves under marginalization and introduce a method for structure learning via
-penalized loss minimization. Our proposed method is demonstrated to
outperform alternative structure learning algorithms in a simulation study, and
we illustrate its application for protein phosphorylation network
reconstruction.Comment: 10 pages, 5 figure
Unifying Gaussian LWF and AMP Chain Graphs to Model Interference
An intervention may have an effect on units other than those to which it was
administered. This phenomenon is called interference and it usually goes
unmodeled. In this paper, we propose to combine Lauritzen-Wermuth-Frydenberg
and Andersson-Madigan-Perlman chain graphs to create a new class of causal
models that can represent both interference and non-interference relationships
for Gaussian distributions. Specifically, we define the new class of models,
introduce global and local and pairwise Markov properties for them, and prove
their equivalence. We also propose an algorithm for maximum likelihood
parameter estimation for the new models, and report experimental results.
Finally, we show how to compute the effects of interventions in the new models.Comment: v2: Section 6 has been added. v3: Sections 7 and 8 have been added.
v4: Major reorganization. v5: Major reorganization. v6-v7: Minor changes. v8:
Addition of Appendix B. v9: Section 7 has been rewritte
Structural Intervention Distance (SID) for Evaluating Causal Graphs
Causal inference relies on the structure of a graph, often a directed acyclic
graph (DAG). Different graphs may result in different causal inference
statements and different intervention distributions. To quantify such
differences, we propose a (pre-) distance between DAGs, the structural
intervention distance (SID). The SID is based on a graphical criterion only and
quantifies the closeness between two DAGs in terms of their corresponding
causal inference statements. It is therefore well-suited for evaluating graphs
that are used for computing interventions. Instead of DAGs it is also possible
to compare CPDAGs, completed partially directed acyclic graphs that represent
Markov equivalence classes. Since it differs significantly from the popular
Structural Hamming Distance (SHD), the SID constitutes a valuable additional
measure. We discuss properties of this distance and provide an efficient
implementation with software code available on the first author's homepage (an
R package is under construction)
- …