33 research outputs found
In search of dispersed memories: Generative diffusion models are associative memory networks
Uncovering the mechanisms behind long-term memory is one of the most
fascinating open problems in neuroscience and artificial intelligence.
Artificial associative memory networks have been used to formalize important
aspects of biological memory. Generative diffusion models are a type of
generative machine learning techniques that have shown great performance in
many tasks. Like associative memory systems, these networks define a dynamical
system that converges to a set of target states. In this work we show that
generative diffusion models can be interpreted as energy-based models and that,
when trained on discrete patterns, their energy function is (asymptotically)
identical to that of modern Hopfield networks. This equivalence allows us to
interpret the supervised training of diffusion models as a synaptic learning
process that encodes the associative dynamics of a modern Hopfield network in
the weight structure of a deep neural network. Leveraging this connection, we
formulate a generalized framework for understanding the formation of long-term
memory, where creative generation and memory recall can be seen as parts of a
unified continuum
The statistical thermodynamics of generative diffusion models
Generative diffusion models have achieved spectacular performance in many
areas of generative modeling. While the fundamental ideas behind these models
come from non-equilibrium physics, in this paper we show that many aspects of
these models can be understood using the tools of equilibrium statistical
mechanics. Using this reformulation, we show that generative diffusion models
undergo second-order phase transitions corresponding to symmetry breaking
phenomena. We argue that this lead to a form of instability that lies at the
heart of their generative capabilities and that can be described by a set of
mean field critical exponents. We conclude by analyzing recent work connecting
diffusion models and associative memory networks in view of the thermodynamic
formulations
Spontaneous symmetry breaking in generative diffusion models
Generative diffusion models have recently emerged as a leading approach for
generating high-dimensional data. In this paper, we show that the dynamics of
these models exhibit a spontaneous symmetry breaking that divides the
generative dynamics into two distinct phases: 1) A linear steady-state dynamics
around a central fixed-point and 2) an attractor dynamics directed towards the
data manifold. These two "phases" are separated by the change in stability of
the central fixed-point, with the resulting window of instability being
responsible for the diversity of the generated samples. Using both theoretical
and empirical evidence, we show that an accurate simulation of the early
dynamics does not significantly contribute to the final generation, since early
fluctuations are reverted to the central fixed point. To leverage this insight,
we propose a Gaussian late initialization scheme, which significantly improves
model performance, achieving up to 3x FID improvements on fast samplers, while
also increasing sample diversity (e.g., racial composition of generated CelebA
images). Our work offers a new way to understand the generative dynamics of
diffusion models that has the potential to bring about higher performance and
less biased fast-samplers.Comment: typo correcte
Dynamic Decomposition of Spatiotemporal Neural Signals
Neural signals are characterized by rich temporal and spatiotemporal dynamics
that reflect the organization of cortical networks. Theoretical research has
shown how neural networks can operate at different dynamic ranges that
correspond to specific types of information processing. Here we present a data
analysis framework that uses a linearized model of these dynamic states in
order to decompose the measured neural signal into a series of components that
capture both rhythmic and non-rhythmic neural activity. The method is based on
stochastic differential equations and Gaussian process regression. Through
computer simulations and analysis of magnetoencephalographic data, we
demonstrate the efficacy of the method in identifying meaningful modulations of
oscillatory signals corrupted by structured temporal and spatiotemporal noise.
These results suggest that the method is particularly suitable for the analysis
and interpretation of complex temporal and spatiotemporal neural signals
Closing the gap: Exact maximum likelihood training of generative autoencoders using invertible layers
In this work, we provide an exact likelihood alternative to the variational
training of generative autoencoders. We show that VAE-style autoencoders can be
constructed using invertible layers, which offer a tractable exact likelihood
without the need for any regularization terms. This is achieved while leaving
complete freedom in the choice of encoder, decoder and prior architectures,
making our approach a drop-in replacement for the training of existing VAEs and
VAE-style models. We refer to the resulting models as Autoencoders within Flows
(AEF), since the encoder, decoder and prior are defined as individual layers of
an overall invertible architecture. We show that the approach results in
strikingly higher performance than architecturally equivalent VAEs in term of
log-likelihood, sample quality and denoising performance. In a broad sense, the
main ambition of this work is to close the gap between the normalizing flow and
autoencoder literature under the common framework of invertibility and exact
maximum likelihood
Scaling up learning with GAIT-prop
Backpropagation of error (BP) is a widely used and highly successful learning
algorithm. However, its reliance on non-local information in propagating error
gradients makes it seem an unlikely candidate for learning in the brain. In the
last decade, a number of investigations have been carried out focused upon
determining whether alternative more biologically plausible computations can be
used to approximate BP. This work builds on such a local learning algorithm -
Gradient Adjusted Incremental Target Propagation (GAIT-prop) - which has
recently been shown to approximate BP in a manner which appears biologically
plausible. This method constructs local, layer-wise weight update targets in
order to enable plausible credit assignment. However, in deep networks, the
local weight updates computed by GAIT-prop can deviate from BP for a number of
reasons. Here, we provide and test methods to overcome such sources of error.
In particular, we adaptively rescale the locally-computed errors and show that
this significantly increases the performance and stability of the GAIT-prop
algorithm when applied to the CIFAR-10 dataset