231 research outputs found
Conditional Sum-Product Networks: Imposing Structure on Deep Probabilistic Architectures
Probabilistic graphical models are a central tool in AI; however, they are
generally not as expressive as deep neural models, and inference is notoriously
hard and slow. In contrast, deep probabilistic models such as sum-product
networks (SPNs) capture joint distributions in a tractable fashion, but still
lack the expressive power of intractable models based on deep neural networks.
Therefore, we introduce conditional SPNs (CSPNs), conditional density
estimators for multivariate and potentially hybrid domains which allow
harnessing the expressive power of neural networks while still maintaining
tractability guarantees. One way to implement CSPNs is to use an existing SPN
structure and condition its parameters on the input, e.g., via a deep neural
network. This approach, however, might misrepresent the conditional
independence structure present in data. Consequently, we also develop a
structure-learning approach that derives both the structure and parameters of
CSPNs from data. Our experimental evidence demonstrates that CSPNs are
competitive with other probabilistic models and yield superior performance on
multilabel image classification compared to mean field and mixture density
networks. Furthermore, they can successfully be employed as building blocks for
structured probabilistic models, such as autoregressive image models.Comment: 13 pages, 6 figure
Visualizing and Understanding Sum-Product Networks
Sum-Product Networks (SPNs) are recently introduced deep tractable
probabilistic models by which several kinds of inference queries can be
answered exactly and in a tractable time. Up to now, they have been largely
used as black box density estimators, assessed only by comparing their
likelihood scores only. In this paper we explore and exploit the inner
representations learned by SPNs. We do this with a threefold aim: first we want
to get a better understanding of the inner workings of SPNs; secondly, we seek
additional ways to evaluate one SPN model and compare it against other
probabilistic models, providing diagnostic tools to practitioners; lastly, we
want to empirically evaluate how good and meaningful the extracted
representations are, as in a classic Representation Learning framework. In
order to do so we revise their interpretation as deep neural networks and we
propose to exploit several visualization techniques on their node activations
and network outputs under different types of inference queries. To investigate
these models as feature extractors, we plug some SPNs, learned in a greedy
unsupervised fashion on image datasets, in supervised classification learning
tasks. We extract several embedding types from node activations by filtering
nodes by their type, by their associated feature abstraction level and by their
scope. In a thorough empirical comparison we prove them to be competitive
against those generated from popular feature extractors as Restricted Boltzmann
Machines. Finally, we investigate embeddings generated from random
probabilistic marginal queries as means to compare other tractable
probabilistic models on a common ground, extending our experiments to Mixtures
of Trees.Comment: Machine Learning Journal paper (First Online), 24 page
Bayesian Learning of Sum-Product Networks
Sum-product networks (SPNs) are flexible density estimators and have received
significant attention due to their attractive inference properties. While
parameter learning in SPNs is well developed, structure learning leaves
something to be desired: Even though there is a plethora of SPN structure
learners, most of them are somewhat ad-hoc and based on intuition rather than a
clear learning principle. In this paper, we introduce a well-principled
Bayesian framework for SPN structure learning. First, we decompose the problem
into i) laying out a computational graph, and ii) learning the so-called scope
function over the graph. The first is rather unproblematic and akin to neural
network architecture validation. The second represents the effective structure
of the SPN and needs to respect the usual structural constraints in SPN, i.e.
completeness and decomposability. While representing and learning the scope
function is somewhat involved in general, in this paper, we propose a natural
parametrisation for an important and widely used special case of SPNs. These
structural parameters are incorporated into a Bayesian model, such that
simultaneous structure and parameter learning is cast into monolithic Bayesian
posterior inference. In various experiments, our Bayesian SPNs often improve
test likelihoods over greedy SPN learners. Further, since the Bayesian
framework protects against overfitting, we can evaluate hyper-parameters
directly on the Bayesian model score, waiving the need for a separate
validation set, which is especially beneficial in low data regimes. Bayesian
SPNs can be applied to heterogeneous domains and can easily be extended to
nonparametric formulations. Moreover, our Bayesian approach is the first, which
consistently and robustly learns SPN structures under missing data.Comment: NeurIPS 2019; See conference page for supplemen
Automatic Bayesian Density Analysis
Making sense of a dataset in an automatic and unsupervised fashion is a
challenging problem in statistics and AI. Classical approaches for {exploratory
data analysis} are usually not flexible enough to deal with the uncertainty
inherent to real-world data: they are often restricted to fixed latent
interaction models and homogeneous likelihoods; they are sensitive to missing,
corrupt and anomalous data; moreover, their expressiveness generally comes at
the price of intractable inference. As a result, supervision from statisticians
is usually needed to find the right model for the data. However, since domain
experts are not necessarily also experts in statistics, we propose Automatic
Bayesian Density Analysis (ABDA) to make exploratory data analysis accessible
at large. Specifically, ABDA allows for automatic and efficient missing value
estimation, statistical data type and likelihood discovery, anomaly detection
and dependency structure mining, on top of providing accurate density
estimation. Extensive empirical evidence shows that ABDA is a suitable tool for
automatic exploratory analysis of mixed continuous and discrete tabular data.Comment: In proceedings of the Thirty-Third AAAI Conference on Artificial
Intelligence (AAAI-19
Why neural networks find simple solutions: the many regularizers of geometric complexity
In many contexts, simpler models are preferable to more complex models and
the control of this model complexity is the goal for many methods in machine
learning such as regularization, hyperparameter tuning and architecture design.
In deep learning, it has been difficult to understand the underlying mechanisms
of complexity control, since many traditional measures are not naturally
suitable for deep neural networks. Here we develop the notion of geometric
complexity, which is a measure of the variability of the model function,
computed using a discrete Dirichlet energy. Using a combination of theoretical
arguments and empirical results, we show that many common training heuristics
such as parameter norm regularization, spectral norm regularization, flatness
regularization, implicit gradient regularization, noise regularization and the
choice of parameter initialization all act to control geometric complexity,
providing a unifying framework in which to characterize the behavior of deep
learning models.Comment: Accepted as a NeurIPS 2022 pape
- …