4,825 research outputs found
Bayesian filtering unifies adaptive and non-adaptive neural network optimization methods
We formulate the problem of neural network optimization as Bayesian
filtering, where the observations are the backpropagated gradients. While
neural network optimization has previously been studied using natural gradient
methods which are closely related to Bayesian inference, they were unable to
recover standard optimizers such as Adam and RMSprop with a root-mean-square
gradient normalizer, instead getting a mean-square normalizer. To recover the
root-mean-square normalizer, we find it necessary to account for the temporal
dynamics of all the other parameters as they are geing optimized. The resulting
optimizer, AdaBayes, adaptively transitions between SGD-like and Adam-like
behaviour, automatically recovers AdamW, a state of the art variant of Adam
with decoupled weight decay, and has generalisation performance competitive
with SGD
Modeling sequences and temporal networks with dynamic community structures
In evolving complex systems such as air traffic and social organizations,
collective effects emerge from their many components' dynamic interactions.
While the dynamic interactions can be represented by temporal networks with
nodes and links that change over time, they remain highly complex. It is
therefore often necessary to use methods that extract the temporal networks'
large-scale dynamic community structure. However, such methods are subject to
overfitting or suffer from effects of arbitrary, a priori imposed timescales,
which should instead be extracted from data. Here we simultaneously address
both problems and develop a principled data-driven method that determines
relevant timescales and identifies patterns of dynamics that take place on
networks as well as shape the networks themselves. We base our method on an
arbitrary-order Markov chain model with community structure, and develop a
nonparametric Bayesian inference framework that identifies the simplest such
model that can explain temporal interaction data.Comment: 15 Pages, 6 figures, 2 table
A network approach to topic models
One of the main computational and scientific challenges in the modern age is
to extract useful information from unstructured texts. Topic models are one
popular machine-learning approach which infers the latent topical structure of
a collection of documents. Despite their success --- in particular of its most
widely used variant called Latent Dirichlet Allocation (LDA) --- and numerous
applications in sociology, history, and linguistics, topic models are known to
suffer from severe conceptual and practical problems, e.g. a lack of
justification for the Bayesian priors, discrepancies with statistical
properties of real texts, and the inability to properly choose the number of
topics. Here we obtain a fresh view on the problem of identifying topical
structures by relating it to the problem of finding communities in complex
networks. This is achieved by representing text corpora as bipartite networks
of documents and words. By adapting existing community-detection methods --
using a stochastic block model (SBM) with non-parametric priors -- we obtain a
more versatile and principled framework for topic modeling (e.g., it
automatically detects the number of topics and hierarchically clusters both the
words and documents). The analysis of artificial and real corpora demonstrates
that our SBM approach leads to better topic models than LDA in terms of
statistical model selection. More importantly, our work shows how to formally
relate methods from community detection and topic modeling, opening the
possibility of cross-fertilization between these two fields.Comment: 22 pages, 10 figures, code available at https://topsbm.github.io
Non-parametric Bayesian modeling of complex networks
Modeling structure in complex networks using Bayesian non-parametrics makes
it possible to specify flexible model structures and infer the adequate model
complexity from the observed data. This paper provides a gentle introduction to
non-parametric Bayesian modeling of complex networks: Using an infinite mixture
model as running example we go through the steps of deriving the model as an
infinite limit of a finite parametric model, inferring the model parameters by
Markov chain Monte Carlo, and checking the model's fit and predictive
performance. We explain how advanced non-parametric models for complex networks
can be derived and point out relevant literature
Efficient inference of overlapping communities in complex networks
We discuss two views on extending existing methods for complex network
modeling which we dub the communities first and the networks first view,
respectively. Inspired by the networks first view that we attribute to White,
Boorman, and Breiger (1976)[1], we formulate the multiple-networks stochastic
blockmodel (MNSBM), which seeks to separate the observed network into
subnetworks of different types and where the problem of inferring structure in
each subnetwork becomes easier. We show how this model is specified in a
generative Bayesian framework where parameters can be inferred efficiently
using Gibbs sampling. The result is an effective multiple-membership model
without the drawbacks of introducing complex definitions of "groups" and how
they interact. We demonstrate results on the recovery of planted structure in
synthetic networks and show very encouraging results on link prediction
performances using multiple-networks models on a number of real-world network
data sets
Learning Topic Models and Latent Bayesian Networks Under Expansion Constraints
Unsupervised estimation of latent variable models is a fundamental problem
central to numerous applications of machine learning and statistics. This work
presents a principled approach for estimating broad classes of such models,
including probabilistic topic models and latent linear Bayesian networks, using
only second-order observed moments. The sufficient conditions for
identifiability of these models are primarily based on weak expansion
constraints on the topic-word matrix, for topic models, and on the directed
acyclic graph, for Bayesian networks. Because no assumptions are made on the
distribution among the latent variables, the approach can handle arbitrary
correlations among the topics or latent factors. In addition, a tractable
learning method via optimization is proposed and studied in numerical
experiments.Comment: 38 pages, 6 figures, 2 tables, applications in topic models and
Bayesian networks are studied. Simulation section is adde
- …