24,028 research outputs found
Learning Large-Scale Bayesian Networks with the sparsebn Package
Learning graphical models from data is an important problem with wide
applications, ranging from genomics to the social sciences. Nowadays datasets
often have upwards of thousands---sometimes tens or hundreds of thousands---of
variables and far fewer samples. To meet this challenge, we have developed a
new R package called sparsebn for learning the structure of large, sparse
graphical models with a focus on Bayesian networks. While there are many
existing software packages for this task, this package focuses on the unique
setting of learning large networks from high-dimensional data, possibly with
interventions. As such, the methods provided place a premium on scalability and
consistency in a high-dimensional setting. Furthermore, in the presence of
interventions, the methods implemented here achieve the goal of learning a
causal network from data. Additionally, the sparsebn package is fully
compatible with existing software packages for network analysis.Comment: To appear in the Journal of Statistical Software, 39 pages, 7 figure
A Formal Treatment of Sequential Ignorability
Taking a rigorous formal approach, we consider sequential decision problems
involving observable variables, unobservable variables, and action variables.
We can typically assume the property of extended stability, which allows
identification (by means of G-computation) of the consequence of a specified
treatment strategy if the unobserved variables are, in fact, observed - but not
generally otherwise. However, under certain additional special conditions we
can infer simple stability (or sequential ignorability), which supports
G-computation based on the observed variables alone. One such additional
condition is sequential randomization, where the unobserved variables
essentially behave as random noise in their effects on the actions. Another is
sequential irrelevance, where the unobserved variables do not influence future
observed variables. In the latter case, to deduce sequential ignorability in
full generality requires additional positivity conditions. We show here that
these positivity conditions are not required when all variables are discrete.Comment: 25 pages, 5 figures, 1 tabl
Learning Bayesian Networks with the bnlearn R Package
bnlearn is an R package which includes several algorithms for learning the
structure of Bayesian networks with either discrete or continuous variables.
Both constraint-based and score-based algorithms are implemented, and can use
the functionality provided by the snow package to improve their performance via
parallel computing. Several network scores and conditional independence
algorithms are available for both the learning algorithms and independent use.
Advanced plotting options are provided by the Rgraphviz package.Comment: 22 pages, 4 picture
- …