35 research outputs found
On the Relationship between Sum-Product Networks and Bayesian Networks
In this paper, we establish some theoretical connections between Sum-Product
Networks (SPNs) and Bayesian Networks (BNs). We prove that every SPN can be
converted into a BN in linear time and space in terms of the network size. The
key insight is to use Algebraic Decision Diagrams (ADDs) to compactly represent
the local conditional probability distributions at each node in the resulting
BN by exploiting context-specific independence (CSI). The generated BN has a
simple directed bipartite graphical structure. We show that by applying the
Variable Elimination algorithm (VE) to the generated BN with ADD
representations, we can recover the original SPN where the SPN can be viewed as
a history record or caching of the VE inference process. To help state the
proof clearly, we introduce the notion of {\em normal} SPN and present a
theoretical analysis of the consistency and decomposability properties. We
conclude the paper with some discussion of the implications of the proof and
establish a connection between the depth of an SPN and a lower bound of the
tree-width of its corresponding BN.Comment: Full version of the same paper to appear at ICML-201
Parameter and Structure Learning Techniques for Sum Product Networks
Probabilistic graphical models (PGMs) provide a general and flexible framework for reasoning about complex dependencies in noisy domains with many variables. Among the various types of PGMs, sum-product networks (SPNs) have recently generated some interest because exact inference can always be done in linear time with respect to the size of the network. This is particularly attractive since it means that learning an SPN from data always yields a tractable model for inference. Learning the parameters and the structure for SPNs is being explored by various researchers, having algorithms that scale are essential in the era of big data. In this thesis, I present tractable parameter and structure learning techniques for SPNs. First, I propose a new Bayesian moment matching (BMM) algorithm to learn the parameters for SPNs generatively. BMM operates naturally in an online fashion and that can be easily distributed. I demonstrate the effectiveness and scalability of BMM in comparison to other online algorithms in the literature.
Second, I present a discriminative learning algorithm for SPNs based on the Extended Baum-Welch (EBW) algorithm. The experiments show that this algorithm performs better than both generative Expectation-Maximization, and discriminative gradient descent on a wide variety of applications. I also demonstrate the robustness of the algorithm in the case of missing features by comparing its performance to Support Vector Machines and Neural Networks.
Finally, I present the first online structure learning algorithm for recurrent SPNs. Recurrent SPNs were proposed by Mazen et. al to model sequential data. They also proposed a structure learning algorithm which is slow, and it only operates in batch mode. I present the first online algorithm to learn the structure of recurrent SPNs. I also show how the parameters can be learned simultaneously using a modified version of hard-EM algorithm. I compare the performance of the algorithm against different models on sequential data problems
Learning Deep Mixtures of Gaussian Process Experts Using Sum-Product Networks
While Gaussian processes (GPs) are the method of choice for regression tasks,
they also come with practical difficulties, as inference cost scales cubic in
time and quadratic in memory. In this paper, we introduce a natural and
expressive way to tackle these problems, by incorporating GPs in sum-product
networks (SPNs), a recently proposed tractable probabilistic model allowing
exact and efficient inference. In particular, by using GPs as leaves of an SPN
we obtain a novel flexible prior over functions, which implicitly represents an
exponentially large mixture of local GPs. Exact and efficient posterior
inference in this model can be done in a natural interplay of the inference
mechanisms in GPs and SPNs. Thereby, each GP is -- similarly as in a mixture of
experts approach -- responsible only for a subset of data points, which
effectively reduces inference cost in a divide and conquer fashion. We show
that integrating GPs into the SPN framework leads to a promising probabilistic
regression model which is: (1) computational and memory efficient, (2) allows
efficient and exact posterior inference, (3) is flexible enough to mix
different kernel functions, and (4) naturally accounts for non-stationarities
in time series. In a variate of experiments, we show that the SPN-GP model can
learn input dependent parameters and hyper-parameters and is on par with or
outperforms the traditional GPs as well as state of the art approximations on
real-world data
On the Tractability of Neural Causal Inference
Roth (1996) proved that any form of marginal inference with probabilistic
graphical models (e.g. Bayesian Networks) will at least be NP-hard. Introduced
and extensively investigated in the past decade, the neural probabilistic
circuits known as sum-product network (SPN) offers linear time complexity. On
another note, research around neural causal models (NCM) recently gained
traction, demanding a tighter integration of causality for machine learning. To
this end, we present a theoretical investigation of if, when, how and under
what cost tractability occurs for different NCM. We prove that SPN-based causal
inference is generally tractable, opposed to standard MLP-based NCM. We further
introduce a new tractable NCM-class that is efficient in inference and fully
expressive in terms of Pearl's Causal Hierarchy. Our comparative empirical
illustration on simulations and standard benchmarks validates our theoretical
proofs.Comment: Main paper: 8 pages, References: 2 pages, Appendix: 5 pages. Figures:
5 main, 2 appendi
On the Relationship between Sum-Product Networks and Bayesian Networks
Sum-Product Networks (SPNs), which are probabilistic inference machines, have attracted a lot of interests in recent years. They have a wide range of applications, including but not limited to activity modeling, language modeling and speech modeling. Despite their practical applications and popularity, little research has been done in understanding what is the connection and difference between Sum-Product Networks and traditional graphical models, including Bayesian Networks (BNs) and Markov Networks (MNs). In this thesis, I establish some theoretical connections between Sum-Product Networks and Bayesian Networks. First, I prove that every SPN can be converted into a BN in linear time and space in terms of the network size. Second, I show that by applying the Variable Elimination algorithm (VE) to the generated BN, I can recover the original SPN.
In the first direction, I use Algebraic Decision Diagrams (ADDs) to compactly represent the local conditional probability distributions at each node in the resulting BN by exploiting context-specific independence (CSI). The generated BN has a simple directed bipartite graphical structure. I establish the first connection between the depth of SPNs and the tree-width of the generated BNs, showing that the depth of SPNs is proportional to a lower bound of the tree-width of the BN.
In the other direction, I show that by applying the Variable Elimination algorithm (VE) to the generated BN with ADD representations, I can recover the original SPN where the SPN can be viewed as a history record or caching of the VE inference process. To help state the proof clearly, I introduce the notion of {\em normal} SPN and present a theoretical analysis of the consistency and decomposability properties. I provide constructive algorithms to transform any given SPN into its normal form in time and space quadratic in the size of the SPN. Combining the above two directions gives us a deep understanding about the modeling power of SPNs and their inner working mechanism