3,942 research outputs found
A Transformational Characterization of Equivalent Bayesian Network Structures
We present a simple characterization of equivalent Bayesian network
structures based on local transformations. The significance of the
characterization is twofold. First, we are able to easily prove several new
invariant properties of theoretical interest for equivalent structures. Second,
we use the characterization to derive an efficient algorithm that identifies
all of the compelled edges in a structure. Compelled edge identification is of
particular importance for learning Bayesian network structures from data
because these edges indicate causal relationships when certain assumptions
hold.Comment: Appears in Proceedings of the Eleventh Conference on Uncertainty in
Artificial Intelligence (UAI1995
Constraint-based Causal Discovery from Multiple Interventions over Overlapping Variable Sets
Scientific practice typically involves repeatedly studying a system, each
time trying to unravel a different perspective. In each study, the scientist
may take measurements under different experimental conditions (interventions,
manipulations, perturbations) and measure different sets of quantities
(variables). The result is a collection of heterogeneous data sets coming from
different data distributions. In this work, we present algorithm COmbINE, which
accepts a collection of data sets over overlapping variable sets under
different experimental conditions; COmbINE then outputs a summary of all causal
models indicating the invariant and variant structural characteristics of all
models that simultaneously fit all of the input data sets. COmbINE converts
estimated dependencies and independencies in the data into path constraints on
the data-generating causal model and encodes them as a SAT instance. The
algorithm is sound and complete in the sample limit. To account for conflicting
constraints arising from statistical errors, we introduce a general method for
sorting constraints in order of confidence, computed as a function of their
corresponding p-values. In our empirical evaluation, COmbINE outperforms in
terms of efficiency the only pre-existing similar algorithm; the latter
additionally admits feedback cycles, but does not admit conflicting constraints
which hinders the applicability on real data. As a proof-of-concept, COmbINE is
employed to co-analyze 4 real, mass-cytometry data sets measuring
phosphorylated protein concentrations of overlapping protein sets under 3
different interventions
Learning Structures of Bayesian Networks for Variable Groups
Bayesian networks, and especially their structures, are powerful tools for
representing conditional independencies and dependencies between random
variables. In applications where related variables form a priori known groups,
chosen to represent different "views" to or aspects of the same entities, one
may be more interested in modeling dependencies between groups of variables
rather than between individual variables. Motivated by this, we study prospects
of representing relationships between variable groups using Bayesian network
structures. We show that for dependency structures between groups to be
expressible exactly, the data have to satisfy the so-called groupwise
faithfulness assumption. We also show that one cannot learn causal relations
between groups using only groupwise conditional independencies, but also
variable-wise relations are needed. Additionally, we present algorithms for
finding the groupwise dependency structures.Comment: To appear at the International Journal of Approximate Reasoning. A
preliminary version appeared in Proceedings of the Eighth International
Conference on Probabilistic Graphical Model
Structure Learning in Graphical Modeling
A graphical model is a statistical model that is associated to a graph whose
nodes correspond to variables of interest. The edges of the graph reflect
allowed conditional dependencies among the variables. Graphical models admit
computationally convenient factorization properties and have long been a
valuable tool for tractable modeling of multivariate distributions. More
recently, applications such as reconstructing gene regulatory networks from
gene expression data have driven major advances in structure learning, that is,
estimating the graph underlying a model. We review some of these advances and
discuss methods such as the graphical lasso and neighborhood selection for
undirected graphical models (or Markov random fields), and the PC algorithm and
score-based search methods for directed graphical models (or Bayesian
networks). We further review extensions that account for effects of latent
variables and heterogeneous data sources
The Reduced PC-Algorithm: Improved Causal Structure Learning in Large Random Networks
We consider the task of estimating a high-dimensional directed acyclic graph,
given observations from a linear structural equation model with arbitrary noise
distribution. By exploiting properties of common random graphs, we develop a
new algorithm that requires conditioning only on small sets of variables. The
proposed algorithm, which is essentially a modified version of the
PC-Algorithm, offers significant gains in both computational complexity and
estimation accuracy. In particular, it results in more efficient and accurate
estimation in large networks containing hub nodes, which are common in
biological systems. We prove the consistency of the proposed algorithm, and
show that it also requires a less stringent faithfulness assumption than the
PC-Algorithm. Simulations in low and high-dimensional settings are used to
illustrate these findings. An application to gene expression data suggests that
the proposed algorithm can identify a greater number of clinically relevant
genes than current methods
Local Structure Discovery in Bayesian Networks
Learning a Bayesian network structure from data is an NP-hard problem and
thus exact algorithms are feasible only for small data sets. Therefore, network
structures for larger networks are usually learned with various heuristics.
Another approach to scaling up the structure learning is local learning. In
local learning, the modeler has one or more target variables that are of
special interest; he wants to learn the structure near the target variables and
is not interested in the rest of the variables. In this paper, we present a
score-based local learning algorithm called SLL. We conjecture that our
algorithm is theoretically sound in the sense that it is optimal in the limit
of large sample size. Empirical results suggest that SLL is competitive when
compared to the constraint-based HITON algorithm. We also study the prospects
of constructing the network structure for the whole node set based on local
results by presenting two algorithms and comparing them to several heuristics.Comment: Appears in Proceedings of the Twenty-Eighth Conference on Uncertainty
in Artificial Intelligence (UAI2012
LSBN: A Large-Scale Bayesian Structure Learning Framework for Model Averaging
The motivation for this paper is to apply Bayesian structure learning using
Model Averaging in large-scale networks. Currently, Bayesian model averaging
algorithm is applicable to networks with only tens of variables, restrained by
its super-exponential complexity. We present a novel framework, called
LSBN(Large-Scale Bayesian Network), making it possible to handle networks with
infinite size by following the principle of divide-and-conquer. The method of
LSBN comprises three steps. In general, LSBN first performs the partition by
using a second-order partition strategy, which achieves more robust results.
LSBN conducts sampling and structure learning within each overlapping community
after the community is isolated from other variables by Markov Blanket. Finally
LSBN employs an efficient algorithm, to merge structures of overlapping
communities into a whole. In comparison with other four state-of-art
large-scale network structure learning algorithms such as ARACNE, PC, Greedy
Search and MMHC, LSBN shows comparable results in five common benchmark
datasets, evaluated by precision, recall and f-score. What's more, LSBN makes
it possible to learn large-scale Bayesian structure by Model Averaging which
used to be intractable. In summary, LSBN provides an scalable and parallel
framework for the reconstruction of network structures. Besides, the complete
information of overlapping communities serves as the byproduct, which could be
used to mine meaningful clusters in biological networks, such as
protein-protein-interaction network or gene regulatory network, as well as in
social network.Comment: 13 pages, 6 figure
A theoretical study of Y structures for causal discovery
There are several existing algorithms that under appropriate assumptions can
reliably identify a subset of the underlying causal relationships from
observational data. This paper introduces the first computationally feasible
score-based algorithm that can reliably identify causal relationships in the
large sample limit for discrete models, while allowing for the possibility that
there are unobserved common causes. In doing so, the algorithm does not ever
need to assign scores to causal structures with unobserved common causes. The
algorithm is based on the identification of so called Y substructures within
Bayesian network structures that can be learned from observational data. An
example of a Y substructure is A -> C, B -> C, C -> D. After providing
background on causal discovery, the paper proves the conditions under which the
algorithm is reliable in the large sample limit.Comment: Appears in Proceedings of the Twenty-Second Conference on Uncertainty
in Artificial Intelligence (UAI2006
An Algorithm for the Construction of Bayesian Network Structures from Data
Previous algorithms for the construction of Bayesian belief network
structures from data have been either highly dependent on conditional
independence (CI) tests, or have required an ordering on the nodes to be
supplied by the user. We present an algorithm that integrates these two
approaches - CI tests are used to generate an ordering on the nodes from the
database which is then used to recover the underlying Bayesian network
structure using a non CI based method. Results of preliminary evaluation of the
algorithm on two networks (ALARM and LED) are presented. We also discuss some
algorithm performance issues and open problems.Comment: Appears in Proceedings of the Ninth Conference on Uncertainty in
Artificial Intelligence (UAI1993
High-dimensional consistency in score-based and hybrid structure learning
Main approaches for learning Bayesian networks can be classified as
constraint-based, score-based or hybrid methods. Although high-dimensional
consistency results are available for constraint-based methods like the PC
algorithm, such results have not been proved for score-based or hybrid methods,
and most of the hybrid methods have not even shown to be consistent in the
classical setting where the number of variables remains fixed and the sample
size tends to infinity. In this paper, we show that consistency of hybrid
methods based on greedy equivalence search (GES) can be achieved in the
classical setting with adaptive restrictions on the search space that depend on
the current state of the algorithm. Moreover, we prove consistency of GES and
adaptively restricted GES (ARGES) in several sparse high-dimensional settings.
ARGES scales well to sparse graphs with thousands of variables and our
simulation study indicates that both GES and ARGES generally outperform the PC
algorithm.Comment: 37 pages, 5 figures, 41 pages supplement (available as an ancillary
file
- …