88 research outputs found
Bayesian Posterior Sampling via Stochastic Gradient Fisher Scoring
In this paper we address the following question: Can we approximately sample
from a Bayesian posterior distribution if we are only allowed to touch a small
mini-batch of data-items for every sample we generate?. An algorithm based on
the Langevin equation with stochastic gradients (SGLD) was previously proposed
to solve this, but its mixing rate was slow. By leveraging the Bayesian Central
Limit Theorem, we extend the SGLD algorithm so that at high mixing rates it
will sample from a normal approximation of the posterior, while for slow mixing
rates it will mimic the behavior of SGLD with a pre-conditioner matrix. As a
bonus, the proposed algorithm is reminiscent of Fisher scoring (with stochastic
gradients) and as such an efficient optimizer during burn-in.Comment: Appears in Proceedings of the 29th International Conference on
Machine Learning (ICML 2012
Denoising Criterion for Variational Auto-Encoding Framework
Denoising autoencoders (DAE) are trained to reconstruct their clean inputs
with noise injected at the input level, while variational autoencoders (VAE)
are trained with noise injected in their stochastic hidden layer, with a
regularizer that encourages this noise injection. In this paper, we show that
injecting noise both in input and in the stochastic hidden layer can be
advantageous and we propose a modified variational lower bound as an improved
objective function in this setup. When input is corrupted, then the standard
VAE lower bound involves marginalizing the encoder conditional distribution
over the input noise, which makes the training criterion intractable. Instead,
we propose a modified training criterion which corresponds to a tractable bound
when input is corrupted. Experimentally, we find that the proposed denoising
variational autoencoder (DVAE) yields better average log-likelihood than the
VAE and the importance weighted autoencoder on the MNIST and Frey Face
datasets.Comment: ICLR conference submissio
Spatially-Aware Transformer for Embodied Agents
Episodic memory plays a crucial role in various cognitive processes, such as
the ability to mentally recall past events. While cognitive science emphasizes
the significance of spatial context in the formation and retrieval of episodic
memory, the current primary approach to implementing episodic memory in AI
systems is through transformers that store temporally ordered experiences,
which overlooks the spatial dimension. As a result, it is unclear how the
underlying structure could be extended to incorporate the spatial axis beyond
temporal order alone and thereby what benefits can be obtained. To address
this, this paper explores the use of Spatially-Aware Transformer models that
incorporate spatial information. These models enable the creation of
place-centric episodic memory that considers both temporal and spatial
dimensions. Adopting this approach, we demonstrate that memory utilization
efficiency can be improved, leading to enhanced accuracy in various
place-centric downstream tasks. Additionally, we propose the Adaptive Memory
Allocator, a memory management method based on reinforcement learning that aims
to optimize efficiency of memory utilization. Our experiments demonstrate the
advantages of our proposed model in various environments and across multiple
downstream tasks, including prediction, generation, reasoning, and
reinforcement learning. The source code for our models and experiments will be
available at https://github.com/junmokane/spatially-aware-transformer.Comment: ICLR 2024 Spotlight. First two authors contributed equall
Neural Block-Slot Representations
In this paper, we propose a novel object-centric representation, called
Block-Slot Representation. Unlike the conventional slot representation, the
Block-Slot Representation provides concept-level disentanglement within a slot.
A block-slot is constructed by composing a set of modular concept
representations, called blocks, generated from a learned memory of abstract
concept prototypes. We call this block-slot construction process Block-Slot
Attention. Block-Slot Attention facilitates the emergence of abstract concept
blocks within a slot such as color, position, and texture, without any
supervision. This brings the benefits of disentanglement into slots and the
representation becomes more interpretable. Similar to Slot Attention, this
mechanism can be used as a drop-in module in any arbitrary neural architecture.
In experiments, we show that our model disentangles object properties
significantly better than the previous methods, including complex textured
scenes. We also demonstrate the ability to compose novel scenes by composing
slots at the block-level
Object-Centric Slot Diffusion
The recent success of transformer-based image generative models in
object-centric learning highlights the importance of powerful image generators
for handling complex scenes. However, despite the high expressiveness of
diffusion models in image generation, their integration into object-centric
learning remains largely unexplored in this domain. In this paper, we explore
the feasibility and potential of integrating diffusion models into
object-centric learning and investigate the pros and cons of this approach. We
introduce Latent Slot Diffusion (LSD), a novel model that serves dual purposes:
it is the first object-centric learning model to replace conventional slot
decoders with a latent diffusion model conditioned on object slots, and it is
also the first unsupervised compositional conditional diffusion model that
operates without the need for supervised annotations like text. Through
experiments on various object-centric tasks, including the first application of
the FFHQ dataset in this field, we demonstrate that LSD significantly
outperforms state-of-the-art transformer-based decoders, particularly in more
complex scenes, and exhibits superior unsupervised compositional generation
quality. Project page is available at
$\href{https://latentslotdiffusion.github.io}{here}
- …