157 research outputs found
Flexible and accurate inference and learning for deep generative models
We introduce a new approach to learning in hierarchical latent-variable
generative models called the "distributed distributional code Helmholtz
machine", which emphasises flexibility and accuracy in the inferential process.
In common with the original Helmholtz machine and later variational autoencoder
algorithms (but unlike adverserial methods) our approach learns an explicit
inference or "recognition" model to approximate the posterior distribution over
the latent variables. Unlike in these earlier methods, the posterior
representation is not limited to a narrow tractable parameterised form (nor is
it represented by samples). To train the generative and recognition models we
develop an extended wake-sleep algorithm inspired by the original Helmholtz
Machine. This makes it possible to learn hierarchical latent models with both
discrete and continuous variables, where an accurate posterior representation
is essential. We demonstrate that the new algorithm outperforms current
state-of-the-art methods on synthetic, natural image patch and the MNIST data
sets
A neurally plausible model learns successor representations in partially observable environments
Animals need to devise strategies to maximize returns while interacting with
their environment based on incoming noisy sensory observations. Task-relevant
states, such as the agent's location within an environment or the presence of a
predator, are often not directly observable but must be inferred using
available sensory information. Successor representations (SR) have been
proposed as a middle-ground between model-based and model-free reinforcement
learning strategies, allowing for fast value computation and rapid adaptation
to changes in the reward function or goal locations. Indeed, recent studies
suggest that features of neural responses are consistent with the SR framework.
However, it is not clear how such representations might be learned and computed
in partially observed, noisy environments. Here, we introduce a neurally
plausible model using distributional successor features, which builds on the
distributed distributional code for the representation and computation of
uncertainty, and which allows for efficient value function computation in
partially observed environments via the successor representation. We show that
distributional successor features can support reinforcement learning in noisy
environments in which direct learning of successful policies is infeasible
A solution for the mean parametrization of the von Mises-Fisher distribution
The von Mises-Fisher distribution as an exponential family can be expressed
in terms of either its natural or its mean parameters. Unfortunately, however,
the normalization function for the distribution in terms of its mean parameters
is not available in closed form, limiting the practicality of the mean
parametrization and complicating maximum-likelihood estimation more generally.
We derive a second-order ordinary differential equation, the solution to which
yields the mean-parameter normalizer along with its first two derivatives, as
well as the variance function of the family. We also provide closed-form
approximations to the solution of the differential equation. This allows rapid
evaluation of both densities and natural parameters in terms of mean
parameters. We show applications to topic modeling with mixtures of von
Mises-Fisher distributions using Bregman Clustering
The equivalence of information-theoretic and likelihood-based methods for neural dimensionality reduction
Stimulus dimensionality-reduction methods in neuroscience seek to identify a
low-dimensional space of stimulus features that affect a neuron's probability
of spiking. One popular method, known as maximally informative dimensions
(MID), uses an information-theoretic quantity known as "single-spike
information" to identify this space. Here we examine MID from a model-based
perspective. We show that MID is a maximum-likelihood estimator for the
parameters of a linear-nonlinear-Poisson (LNP) model, and that the empirical
single-spike information corresponds to the normalized log-likelihood under a
Poisson model. This equivalence implies that MID does not necessarily find
maximally informative stimulus dimensions when spiking is not well described as
Poisson. We provide several examples to illustrate this shortcoming, and derive
a lower bound on the information lost when spiking is Bernoulli in discrete
time bins. To overcome this limitation, we introduce model-based dimensionality
reduction methods for neurons with non-Poisson firing statistics, and show that
they can be framed equivalently in likelihood-based or information-theoretic
terms. Finally, we show how to overcome practical limitations on the number of
stimulus dimensions that MID can estimate by constraining the form of the
non-parametric nonlinearity in an LNP model. We illustrate these methods with
simulations and data from primate visual cortex
Recommended from our members
Time-Frequency Analysis as Probabilistic Inference
This is the final published version. It was originally published by IEEE at http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=6918491.This paper proposes a new view of time-frequency analysis framed in terms of probabilistic inference. Natural signals are assumed to be formed by the superposition of distinct time-frequency components, with the analytic goal being to infer these components by application of Bayes' rule. The framework serves to unify various existing models for natural time-series; it relates to both the Wiener and Kalman filters, and with suitable assumptions yields inferential interpretations of the short-time Fourier transform, spectrogram, filter bank, and wavelet representations. Value is gained by placing time-frequency analysis on the same probabilistic basis as is often employed in applications such as denoising, source separation, or recognition. Uncertainty in the time-frequency representation can be propagated correctly to application-specific stages, improving the handing of noise and missing data. Probabilistic learning allows modules to be co-adapted; thus, the time-frequency representation can be adapted to both the demands of the application and the time-varying statistics of the signal at hand. Similarly, the application module can be adapted to fine properties of the signal propagated by the initial time-frequency processing. We demonstrate these benefits by combining probabilistic time-frequency representations with non-negative matrix factorization, finding benefits in audio denoising and inpainting tasks, albeit with higher computational cost than incurred by the standard approach.Funding was provided by EPSRC (grant numbers EP/G050821/1 and
EP/L000776/1) and Google (R.E.T.) and by the Gatsby Charitable Foundation
(M.S.)
Kernel Instrumental Variable Regression
Instrumental variable (IV) regression is a strategy for learning causal
relationships in observational data. If measurements of input X and output Y
are confounded, the causal relationship can nonetheless be identified if an
instrumental variable Z is available that influences X directly, but is
conditionally independent of Y given X and the unmeasured confounder. The
classic two-stage least squares algorithm (2SLS) simplifies the estimation
problem by modeling all relationships as linear functions. We propose kernel
instrumental variable regression (KIV), a nonparametric generalization of 2SLS,
modeling relations among X, Y, and Z as nonlinear functions in reproducing
kernel Hilbert spaces (RKHSs). We prove the consistency of KIV under mild
assumptions, and derive conditions under which convergence occurs at the
minimax optimal rate for unconfounded, single-stage RKHS regression. In doing
so, we obtain an efficient ratio between training sample sizes used in the
algorithm's first and second stages. In experiments, KIV outperforms state of
the art alternatives for nonparametric IV regression.Comment: 41 pages, 11 figures. Advances in Neural Information Processing
Systems. 201
Prediction under Latent Subgroup Shifts with High-Dimensional Observations
We introduce a new approach to prediction in graphical models with
latent-shift adaptation, i.e., where source and target environments differ in
the distribution of an unobserved confounding latent variable. Previous work
has shown that as long as "concept" and "proxy" variables with appropriate
dependence are observed in the source environment, the latent-associated
distributional changes can be identified, and target predictions adapted
accurately. However, practical estimation methods do not scale well when the
observations are complex and high-dimensional, even if the confounding latent
is categorical. Here we build upon a recently proposed probabilistic
unsupervised learning framework, the recognition-parametrised model (RPM), to
recover low-dimensional, discrete latents from image observations. Applied to
the problem of latent shifts, our novel form of RPM identifies causal latent
structure in the source environment, and adapts properly to predict in the
target. We demonstrate results in settings where predictor and proxy are
high-dimensional images, a context to which previous methods fail to scale
Inferring context-dependent computations through linear approximations of prefrontal cortex dynamics
The complex neural population activity of prefrontal cortex (PFC) is a hallmark of cognitive processes. How these rich dynamics emerge and support neural computations is largely unknown. Here, we infer mechanisms underlying the context-dependent selection and integration of sensory inputs by fitting dynamical models to PFC population responses of behaving monkeys. A class of models implementing linear dynamics driven by external inputs accurately captured the PFC responses within each context, achieving performance comparable to models without linear constraints. Two distinct mechanisms of input selection and integration were equally consistent with the data. One implemented context-dependent recurrent dynamics, as previously proposed, and relied on transient input amplification. The other relied on the subtle contextual modulation of the inputs, providing quantitative constraints on the attentional effects in sensory areas required to explain flexible PFC responses and behavior. Both mechanisms consistently revealed properties of inputs and recurrent dynamics missing in more simplified, incomplete descriptions of PFC responses. By revealing mechanisms consistent with rich cortical dynamics, our modeling approach provides a principled and general framework to link neural population activity and computation
- …