914 research outputs found
The Libra Toolkit for Probabilistic Models
The Libra Toolkit is a collection of algorithms for learning and inference
with discrete probabilistic models, including Bayesian networks, Markov
networks, dependency networks, and sum-product networks. Compared to other
toolkits, Libra places a greater emphasis on learning the structure of
tractable models in which exact inference is efficient. It also includes a
variety of algorithms for learning graphical models in which inference is
potentially intractable, and for performing exact and approximate inference.
Libra is released under a 2-clause BSD license to encourage broad use in
academia and industry
Conditional Sum-Product Networks: Imposing Structure on Deep Probabilistic Architectures
Probabilistic graphical models are a central tool in AI; however, they are
generally not as expressive as deep neural models, and inference is notoriously
hard and slow. In contrast, deep probabilistic models such as sum-product
networks (SPNs) capture joint distributions in a tractable fashion, but still
lack the expressive power of intractable models based on deep neural networks.
Therefore, we introduce conditional SPNs (CSPNs), conditional density
estimators for multivariate and potentially hybrid domains which allow
harnessing the expressive power of neural networks while still maintaining
tractability guarantees. One way to implement CSPNs is to use an existing SPN
structure and condition its parameters on the input, e.g., via a deep neural
network. This approach, however, might misrepresent the conditional
independence structure present in data. Consequently, we also develop a
structure-learning approach that derives both the structure and parameters of
CSPNs from data. Our experimental evidence demonstrates that CSPNs are
competitive with other probabilistic models and yield superior performance on
multilabel image classification compared to mean field and mixture density
networks. Furthermore, they can successfully be employed as building blocks for
structured probabilistic models, such as autoregressive image models.Comment: 13 pages, 6 figure
Approximation Complexity of Maximum A Posteriori Inference in Sum-Product Networks
We discuss the computational complexity of approximating maximum a posteriori
inference in sum-product networks. We first show NP-hardness in trees of height
two by a reduction from maximum independent set; this implies
non-approximability within a sublinear factor. We show that this is a tight
bound, as we can find an approximation within a linear factor in networks of
height two. We then show that, in trees of height three, it is NP-hard to
approximate the problem within a factor for any sublinear function
of the size of the input . Again, this bound is tight, as we prove that
the usual max-product algorithm finds (in any network) approximations within
factor for some constant . Last, we present a simple
algorithm, and show that it provably produces solutions at least as good as,
and potentially much better than, the max-product algorithm. We empirically
analyze the proposed algorithm against max-product using synthetic and
realistic networks.Comment: 18 page
Bayesian Structure and Parameter Learning of Sum-Product Networks
Sum-product networks (SPN) are graphical models capable of handling large amount of multi-
dimensional data. Unlike many other graphical models, SPNs are tractable if certain structural
requirements are fulfilled; a model is called tractable if probabilistic inference can be performed in
a polynomial time with respect to the size of the model. The learning of SPNs can be separated
into two modes, parameter and structure learning. Many earlier approaches to SPN learning have
treated the two modes as separate, but it has been found that by alternating between these two
modes, good results can be achieved. One example of this kind of algorithm was presented by
Trapp et al. in an article Bayesian Learning of Sum-Product Networks (NeurIPS, 2019).
This thesis discusses SPNs and a Bayesian learning algorithm developed based on the earlier men-
tioned algorithm, differing in some of the used methods. The algorithm by Trapp et al. uses Gibbs
sampling in the parameter learning phase, whereas here Metropolis-Hasting MCMC is used. The
algorithm developed for this thesis was used in two experiments, with a small and simple SPN and
with a larger and more complex SPN. Also, the effect of the data set size and the complexity of
the data was explored. The results were compared to the results got from running the original
algorithm developed by Trapp et al.
The results show that having more data in the learning phase makes the results more accurate as
it is easier for the model to spot patterns from a larger set of data. It was also shown that the
model was able to learn the parameters in the experiments if the data were simple enough, in other
words, if the dimensions of the data contained only one distribution per dimension. In the case
of more complex data, where there were multiple distributions per dimension, the struggle of the
computation was seen from the results
Exchangeable Variable Models
A sequence of random variables is exchangeable if its joint distribution is
invariant under variable permutations. We introduce exchangeable variable
models (EVMs) as a novel class of probabilistic models whose basic building
blocks are partially exchangeable sequences, a generalization of exchangeable
sequences. We prove that a family of tractable EVMs is optimal under zero-one
loss for a large class of functions, including parity and threshold functions,
and strictly subsumes existing tractable independence-based model families.
Extensive experiments show that EVMs outperform state of the art classifiers
such as SVMs and probabilistic models which are solely based on independence
assumptions.Comment: ICML 201
- …