294 research outputs found
Using Noisy Extractions to Discover Causal Knowledge
Knowledge bases (KB) constructed through information extraction from text
play an important role in query answering and reasoning. In this work, we study
a particular reasoning task, the problem of discovering causal relationships
between entities, known as causal discovery. There are two contrasting types of
approaches to discovering causal knowledge. One approach attempts to identify
causal relationships from text using automatic extraction techniques, while the
other approach infers causation from observational data. However, extractions
alone are often insufficient to capture complex patterns and full observational
data is expensive to obtain. We introduce a probabilistic method for fusing
noisy extractions with observational data to discover causal knowledge. We
propose a principled approach that uses the probabilistic soft logic (PSL)
framework to encode well-studied constraints to recover long-range patterns and
consistent predictions, while cheaply acquired extractions provide a proxy for
unseen observations. We apply our method gene regulatory networks and show the
promise of exploiting KB signals in causal discovery, suggesting a critical,
new area of research
HNet: Graphical Hypergeometric Networks
Motivation: Real-world data often contain measurements with both continuous
and discrete values. Despite the availability of many libraries, data sets with
mixed data types require intensive pre-processing steps, and it remains a
challenge to describe the relationships between variables. The data
understanding phase is an important step in the data mining process, however,
without making any assumptions on the data, the search space is
super-exponential in the number of variables. Methods: We propose graphical
hypergeometric networks (HNet), a method to test associations across variables
for significance using statistical inference. The aim is to determine a network
using only the significant associations in order to shed light on the complex
relationships across variables. HNet processes raw unstructured data sets and
outputs a network that consists of (partially) directed or undirected edges
between the nodes (i.e., variables). To evaluate the accuracy of HNet, we used
well known data sets and in addition generated data sets with known ground
truth. The performance of HNet is compared to Bayesian structure learning.
Results: We demonstrate that HNet showed high accuracy and performance in the
detection of node links. In the case of the Alarm data set we can demonstrate
on average an MCC score of 0.33 + 0.0002 (P<1x10-6), whereas Bayesian structure
learning resulted in an average MCC score of 0.52 + 0.006 (P<1x10-11), and
randomly assigning edges resulted in a MCC score of 0.004 + 0.0003 (P=0.49).
Conclusions: HNet can process raw unstructured data sets, allows analysis of
mixed data types, it easily scales up in number of variables, and allows
detailed examination of the detected associations. Availability:
https://erdogant.github.io/hnet/Comment: 6 pages, 4 figure
Learning networks determined by the ratio of prior and data
Recent reports have described that the equivalent sample size (ESS) in a
Dirichlet prior plays an important role in learning Bayesian networks. This
paper provides an asymptotic analysis of the marginal likelihood score for a
Bayesian network. Results show that the ratio of the ESS and sample size
determine the penalty of adding arcs in learning Bayesian networks. The number
of arcs increases monotonically as the ESS increases; the number of arcs
monotonically decreases as the ESS decreases. Furthermore, the marginal
likelihood score provides a unified expression of various score metrics by
changing prior knowledge.Comment: Appears in Proceedings of the Twenty-Sixth Conference on Uncertainty
in Artificial Intelligence (UAI2010
A Branch-and-Bound Algorithm for MDL Learning Bayesian Networks
This paper extends the work in [Suzuki, 1996] and presents an efficient
depth-first branch-and-bound algorithm for learning Bayesian network
structures, based on the minimum description length (MDL) principle, for a
given (consistent) variable ordering. The algorithm exhaustively searches
through all network structures and guarantees to find the network with the best
MDL score. Preliminary experiments show that the algorithm is efficient, and
that the time complexity grows slowly with the sample size. The algorithm is
useful for empirically studying both the performance of suboptimal heuristic
search algorithms and the adequacy of the MDL principle in learning Bayesian
networks.Comment: Appears in Proceedings of the Sixteenth Conference on Uncertainty in
Artificial Intelligence (UAI2000
PAC-learning bounded tree-width Graphical Models
We show that the class of strongly connected graphical models with treewidth
at most k can be properly efficiently PAC-learnt with respect to the
Kullback-Leibler Divergence. Previous approaches to this problem, such as those
of Chow ([1]), and Ho gen ([7]) have shown that this class is PAC-learnable by
reducing it to a combinatorial optimization problem. However, for k > 1, this
problem is NP-complete ([15]), and so unless P=NP, these approaches will take
exponential amounts of time. Our approach differs significantly from these, in
that it first attempts to find approximate conditional independencies by
solving (polynomially many) submodular optimization problems, and then using a
dynamic programming formulation to combine the approximate conditional
independence information to derive a graphical model with underlying graph of
the tree-width specified. This gives us an efficient (polynomial time in the
number of random variables) PAC-learning algorithm which requires only
polynomial number of samples of the true distribution, and only polynomial
running time.Comment: Appears in Proceedings of the Twentieth Conference on Uncertainty in
Artificial Intelligence (UAI2004
Learning Polytrees
We consider the task of learning the maximum-likelihood polytree from data.
Our first result is a performance guarantee establishing that the optimal
branching (or Chow-Liu tree), which can be computed very easily, constitutes a
good approximation to the best polytree. We then show that it is not possible
to do very much better, since the learning problem is NP-hard even to
approximately solve within some constant factor.Comment: Appears in Proceedings of the Fifteenth Conference on Uncertainty in
Artificial Intelligence (UAI1999
Smoothness and Structure Learning by Proxy
As data sets grow in size, the ability of learning methods to find structure
in them is increasingly hampered by the time needed to search the large spaces
of possibilities and generate a score for each that takes all of the observed
data into account. For instance, Bayesian networks, the model chosen in this
paper, have a super-exponentially large search space for a fixed number of
variables. One possible method to alleviate this problem is to use a proxy,
such as a Gaussian Process regressor, in place of the true scoring function,
training it on a selection of sampled networks. We prove here that the use of
such a proxy is well-founded, as we can bound the smoothness of a commonly-used
scoring function for Bayesian network structure learning. We show here that,
compared to an identical search strategy using the network?s exact scores, our
proxy-based search is able to get equivalent or better scores on a number of
data sets in a fraction of the time.Comment: Appears in Proceedings of the 29th International Conference on
Machine Learning (ICML 2012
On the Use of Skeletons when Learning in Bayesian Networks
In this paper, we present a heuristic operator which aims at simultaneously
optimizing the orientations of all the edges in an intermediate Bayesian
network structure during the search process. This is done by alternating
between the space of directed acyclic graphs (DAGs) and the space of skeletons.
The found orientations of the edges are based on a scoring function rather than
on induced conditional independences. This operator can be used as an extension
to commonly employed search strategies. It is evaluated in experiments with
artificial and real-world data.Comment: Appears in Proceedings of the Sixteenth Conference on Uncertainty in
Artificial Intelligence (UAI2000
Learning the Bayesian Network Structure: Dirichlet Prior versus Data
In the Bayesian approach to structure learning of graphical models, the
equivalent sample size (ESS) in the Dirichlet prior over the model parameters
was recently shown to have an important effect on the maximum-a-posteriori
estimate of the Bayesian network structure. In our first contribution, we
theoretically analyze the case of large ESS-values, which complements previous
work: among other results, we find that the presence of an edge in a Bayesian
network is favoured over its absence even if both the Dirichlet prior and the
data imply independence, as long as the conditional empirical distribution is
notably different from uniform. In our second contribution, we focus on
realistic ESS-values, and provide an analytical approximation to the "optimal"
ESS-value in a predictive sense (its accuracy is also validated
experimentally): this approximation provides an understanding as to which
properties of the data have the main effect determining the "optimal"
ESS-value.Comment: Appears in Proceedings of the Twenty-Fourth Conference on Uncertainty
in Artificial Intelligence (UAI2008
Exact Maximum Margin Structure Learning of Bayesian Networks
Recently, there has been much interest in finding globally optimal Bayesian
network structures. These techniques were developed for generative scores and
can not be directly extended to discriminative scores, as desired for
classification. In this paper, we propose an exact method for finding network
structures maximizing the probabilistic soft margin, a successfully applied
discriminative score. Our method is based on branch-and-bound techniques within
a linear programming framework and maintains an any-time solution, together
with worst-case sub-optimality bounds. We apply a set of order constraints for
enforcing the network structure to be acyclic, which allows a compact problem
representation and the use of general-purpose optimization techniques. In
classification experiments, our methods clearly outperform generatively trained
network structures and compete with support vector machines.Comment: ICM
- …