265 research outputs found
Implementing Bayesian Inference with Neural Networks
Embodied agents, be they animals or robots, acquire information about the world through their senses. Embodied agents, however, do not simply lose this information once it passes by, but rather process and store it for future use. The most general theory of how an agent can combine stored knowledge with new observations is Bayesian inference. In this dissertation I present a theory of how embodied agents can learn to implement Bayesian inference with neural networks.
By neural network I mean both artificial and biological neural networks, and in my dissertation I address both kinds. On one hand, I develop theory for implementing Bayesian inference in deep generative models, and I show how to train multilayer perceptrons to compute approximate predictions for Bayesian filtering. On the other hand, I show that several models in computational neuroscience are special cases of the general theory that I develop in this dissertation, and I use this theory to model and explain several phenomena in neuroscience. The key contributions of this dissertation can be summarized as follows:
- I develop a class of graphical model called nth-order harmoniums. An nth-order harmonium is an n-tuple of random variables, where the conditional distribution of each variable given all the others is always an element of the same exponential family. I show that harmoniums have a recursive structure which allows them to be analyzed at coarser and finer levels of detail.
- I define a class of harmoniums called rectified harmoniums, which are constrained to have priors which are conjugate to their posteriors. As a consequence of this, rectified harmoniums afford efficient sampling and learning.
- I develop deep harmoniums, which are harmoniums which can be represented by hierarchical, undirected graphs. I develop the theory of rectification for deep harmoniums, and develop a novel algorithm for training deep generative models.
- I show how to implement a variety of optimal and near-optimal Bayes filters by combining the solution to Bayes' rule provided by rectified harmoniums, with predictions computed by a recurrent neural network. I then show how to train a neural network to implement Bayesian filtering when the transition and emission distributions are unknown.
- I show how some well-established models of neural activity are special cases of the theory I present in this dissertation, and how these models can be generalized with the theory of rectification.
- I show how the theory that I present can model several neural phenomena including proprioception and gain-field modulation of tuning curves.
- I introduce a library for the programming language Haskell, within which I have implemented all the simulations presented in this dissertation. This library uses concepts from Riemannian geometry to provide a rigorous and efficient environment for implementing complex numerical simulations.
I also use the results presented in this dissertation to argue for the fundamental role of neural computation in embodied cognition. I argue, in other words, that before we will be able to build truly intelligent robots, we will need to truly understand biological brains
A neurally plausible model for online recognition and postdiction
Humans and other animals are frequently near-optimal in their ability to integrate noisy and ambiguous sensory data to form robust percepts---which are informed both by sensory evidence and by prior expectations about the structure of the environment. It is suggested that the brain does so using the statistical structure provided by an internal model of how latent, causal factors produce the observed patterns. In dynamic environments, such integration often takes the form of \emph{postdiction}, wherein later sensory evidence affects inferences about earlier percepts. As the brain must operate in current time, without the luxury of acausal propagation of information, how does such postdictive inference come about? Here, we propose a general framework for neural probabilistic inference in dynamic models based on the distributed distributional code (DDC) representation of uncertainty, naturally extending the underlying encoding to incorporate implicit probabilistic beliefs about both present and past. We show that, as in other uses of the DDC, an inferential model can be learnt efficiently using samples from an internal model of the world. Applied to stimuli used in the context of psychophysics experiments, the framework provides an online and plausible mechanism for inference, including postdictive effects
Learning the Ising Model with Generative Neural Networks
Recent advances in deep learning and neural networks have led to an increased
interest in the application of generative models in statistical and condensed
matter physics. In particular, restricted Boltzmann machines (RBMs) and
variational autoencoders (VAEs) as specific classes of neural networks have
been successfully applied in the context of physical feature extraction and
representation learning. Despite these successes, however, there is only
limited understanding of their representational properties and limitations. To
better understand the representational characteristics of RBMs and VAEs, we
study their ability to capture physical features of the Ising model at
different temperatures. This approach allows us to quantitatively assess
learned representations by comparing sample features with corresponding
theoretical predictions. Our results suggest that the considered RBMs and
convolutional VAEs are able to capture the temperature dependence of
magnetization, energy, and spin-spin correlations. The samples generated by
RBMs are more evenly distributed across temperature than those generated by
VAEs. We also find that convolutional layers in VAEs are important to model
spin correlations whereas RBMs achieve similar or even better performances
without convolutional filters.Comment: 18 pages, 13 figure
Unsupervised representation learning with recognition-parametrised probabilistic models
We introduce a new approach to probabilistic
unsupervised learning based on the recognitionparametrised model (RPM): a normalised semiparametric hypothesis class for joint distributions
over observed and latent variables. Under the key
assumption that observations are conditionally
independent given latents, the RPM combines
parametric prior and observation-conditioned latent distributions with non-parametric observation marginals. This approach leads to a flexible
learnt recognition model capturing latent dependence between observations, without the need for
an explicit, parametric generative model. The
RPM admits exact maximum-likelihood learning for discrete latents, even for powerful neuralnetwork-based recognition. We develop effective approximations applicable in the continuouslatent case. Experiments demonstrate the effectiveness of the RPM on high-dimensional data,
learning image classification from weak indirect
supervision; direct image-level latent Dirichlet
allocation; and recognition-parametrised Gaussian process factor analysis (RP-GPFA) applied
to multi-factorial spatiotemporal datasets. The
RPM provides a powerful framework to discover
meaningful latent structure underlying observational data, a function critical to both animal and
artificial intelligence
Unsupervised representation learning with recognition-parametrised probabilistic models
We introduce a new approach to probabilistic unsupervised learning based on
the recognition-parametrised model (RPM): a normalised semi-parametric
hypothesis class for joint distributions over observed and latent variables.
Under the key assumption that observations are conditionally independent given
latents, the RPM combines parametric prior and observation-conditioned latent
distributions with non-parametric observation marginals. This approach leads to
a flexible learnt recognition model capturing latent dependence between
observations, without the need for an explicit, parametric generative model.
The RPM admits exact maximum-likelihood learning for discrete latents, even for
powerful neural-network-based recognition. We develop effective approximations
applicable in the continuous-latent case. Experiments demonstrate the
effectiveness of the RPM on high-dimensional data, learning image
classification from weak indirect supervision; direct image-level latent
Dirichlet allocation; and recognition-parametrised Gaussian process factor
analysis (RP-GPFA) applied to multi-factorial spatiotemporal datasets. The RPM
provides a powerful framework to discover meaningful latent structure
underlying observational data, a function critical to both animal and
artificial intelligence
Nonparametric enrichment in computational and biological representations of distributions
This thesis proposes nonparametric techniques to enhance unsupervised learning methods in computational or biological contexts. Representations of intractable distributions and their relevant statistics are enhanced by nonparametric components trained to handle challenging estimation problems. The first part introduces a generic algorithm for learning generative latent variable models. In contrast to traditional variational learning, no representation for the intractable posterior distributions are computed, making it agnostic to the model structure and the support of latent variables. Kernel ridge regression is used to consistently estimate the gradient for learning. In many unsupervised tasks, this approach outperforms advanced alternatives based on the expectation-maximisation algorithm and variational approximate inference. In the second part, I train a model of data known as the kernel exponential family density. The kernel, used to describe smooth functions, is augmented by a parametric component trained using an efficient meta-learning procedure; meta-learning prevents overfitting as would occur using conventional routines. After training, the contours of the kernel become adaptive to the local geometry of the underlying density. Compared to maximum-likelihood learning, our method better captures the shape of the density, which is the desired quantity in many downstream applications. The final part sees how nonparametric ideas contribute to understanding uncertainty computation in the brain. First, I show that neural networks can learn to represent uncertainty using the distributed distributional code (DDC), a representation similar to the nonparametric kernel mean embedding. I then derive several DDC-based message-passing algorithms, including computations of filtering and real-time smoothing. The latter is a common neural computation embodied in many postdictive phenomena of perception in multiple modalities. The main idea behind these algorithms is least-squares regression, where the training data are simulated from an internal model. The internal model can be concurrently updated to follow the statistics in sensory stimuli, enabling adaptive inference
Optimizing Connectivity through Network Gradients for the Restricted Boltzmann Machine
Leveraging sparse networks to connect successive layers in deep neural
networks has recently been shown to provide benefits to large scale
state-of-the-art models. However, network connectivity also plays a significant
role on the learning performance of shallow networks, such as the classic
Restricted Boltzmann Machines (RBM). Efficiently finding sparse connectivity
patterns that improve the learning performance of shallow networks is a
fundamental problem. While recent principled approaches explicitly include
network connections as model parameters that must be optimized, they often rely
on explicit penalization or have network sparsity as a hyperparameter. This
work presents a method to find optimal connectivity patterns for RBMs based on
the idea of network gradients (NCG): computing the gradient of every possible
connection, given a specific connection pattern, and using the gradient to
drive a continuous connection strength parameter that in turn is used to
determine the connection pattern. Thus, learning RBM parameters and learning
network connections is truly jointly performed, albeit with different learning
rates, and without changes to the objective function. The method is applied to
the MNIST and other datasets showing that better RBM models are found for the
benchmark tasks of sample generation and input classification. Results also
show that NCG is robust to network initialization, both adding and removing
network connections while learning
- …