95 research outputs found
Learning Programmatically Structured Representations with Perceptor Gradients
We present the perceptor gradients algorithm -- a novel approach to learning
symbolic representations based on the idea of decomposing an agent's policy
into i) a perceptor network extracting symbols from raw observation data and
ii) a task encoding program which maps the input symbols to output actions. We
show that the proposed algorithm is able to learn representations that can be
directly fed into a Linear-Quadratic Regulator (LQR) or a general purpose A*
planner. Our experimental results confirm that the perceptor gradients
algorithm is able to efficiently learn transferable symbolic representations as
well as generate new observations according to a semantically meaningful
specification.Comment: Published as a conference paper at ICLR 201
Does Continual Learning = Catastrophic Forgetting?
Continual learning is known for suffering from catastrophic forgetting, a
phenomenon where earlier learned concepts are forgotten at the expense of more
recent samples. In this work, we challenge the assumption that continual
learning is inevitably associated with catastrophic forgetting by presenting a
set of tasks that surprisingly do not suffer from catastrophic forgetting when
learned continually. We provide evidence that these reconstruction-type tasks
exhibit positive forward transfer and that single-view 3D shape reconstruction
improves the performance on learned and novel categories over time. We provide
the novel analysis of knowledge transfer ability by looking at the output
distribution shift across sequential learning tasks. Finally, we show that the
robustness of these tasks leads to the potential of having a proxy
representation learning task for continual classification. The codebase,
dataset, and pre-trained models released with this article can be found at
https://github.com/rehg-lab/CLRec
Holographic Generative Memory: Neurally Inspired One-Shot Learning with Memory Augmented Neural Networks
Humans quickly parse and categorize stimuli by combining perceptual information and previously learned knowledge. We are capable of learning new information quickly with only a few observations, and sometimes even a single observation. This one-shot learning (OSL) capability is still very difficult to realize in machine learning models. Novelty is commonly thought to be the primary driver for OSL. However, neuroscience literature shows that biological OSL mechanisms are guided by uncertainty, rather than novelty, motivating us to explore this idea for machine learning.
In this work, we investigate OSL for neural networks using more robust compositional knowledge representations and a biologically inspired uncertainty mechanism to modulate the rate of learning. We introduce several new neural network models that combine Holographic Reduced Representation (HRR) and Variational Autoencoders. Extending these new models culminates in the Holographic Generative Memory (HGMEM) model.
HGMEM is a novel unsupervised memory augmented neural network. It offers solutions to many of the practical drawbacks associated with HRRs while also providing storage, recall, and generation of latent compositional knowledge representations. Uncertainty is measured as a native part of HGMEM operation by applying trained probabilistic dropout to fully-connected layers. During training, the learning rate is modulated using these uncertainty measurements in a manner inspired by our motivating neuroscience mechanism for OSL. Model performance is demonstrated on several image datasets with experiments that reflect our theoretical approach
A Strong Transfer Baseline for RGB-D Fusion in Vision Transformers
The Vision Transformer (ViT) architecture has recently established its place in the computer vision literature, with multiple architectures for recognition of image data or other visual modalities. However, training ViTs for RGB-D object recognition remains an understudied topic, viewed in recent literature only through the lens of multi-task pretraining in multiple modalities. Such approaches are often computationally intensive and have not yet been applied for challenging object-level classification tasks. In this work, we propose a simple yet strong recipe for transferring pretrained ViTs in RGB-D domains for single-view 3D object recognition, focusing on fusing RGB and depth representations encoded jointly by the ViT. Compared to previous works in multimodal Transformers, the key challenge here is to use the atested flexibility of ViTs to capture cross-modal interactions at the downstream and not the pretraining stage. We explore which depth representation is better in terms of resulting accuracy and compare two methods for injecting RGB-D fusion within the ViT architecture (i.e., early vs. late fusion). Our results in the Washington RGB-D Objects dataset demonstrates that in such RGB → RGB-D scenarios, late fusion techniques work better than most popularly employed early fusion. With our transfer baseline, adapted ViTs score up to 95.1\% top-1 accuracy in Washington, achieving new state-of-the-art results in this benchmark. We additionally evaluate our approach with an open-ended lifelong learning protocol, where we show that our adapted RGB-D encoder leads to features that outperform unimodal encoders, even without explicit fine-tuning. We further integrate our method with a robot framework and demonstrate how it can serve as a perception utility in an interactive robot learning scenario, both in simulation and with a real robot
- …