35 research outputs found
In-memory Realization of In-situ Few-shot Continual Learning with a Dynamically Evolving Explicit Memory
Continually learning new classes from a few training examples without
forgetting previous old classes demands a flexible architecture with an
inevitably growing portion of storage, in which new examples and classes can be
incrementally stored and efficiently retrieved. One viable architectural
solution is to tightly couple a stationary deep neural network to a dynamically
evolving explicit memory (EM). As the centerpiece of this architecture, we
propose an EM unit that leverages energy-efficient in-memory compute (IMC)
cores during the course of continual learning operations. We demonstrate for
the first time how the EM unit can physically superpose multiple training
examples, expand to accommodate unseen classes, and perform similarity search
during inference, using operations on an IMC core based on phase-change memory
(PCM). Specifically, the physical superposition of a few encoded training
examples is realized via in-situ progressive crystallization of PCM devices.
The classification accuracy achieved on the IMC core remains within a range of
1.28%--2.5% compared to that of the state-of-the-art full-precision baseline
software model on both the CIFAR-100 and miniImageNet datasets when continually
learning 40 novel classes (from only five examples per class) on top of 60 old
classes.Comment: Accepted at the European Solid-state Devices and Circuits Conference
(ESSDERC), September 202
Recommended from our members
Improving deep learning through loss-function evolution
As the complexity of neural network models has grown, it has become increasingly important to optimize their design automatically through metalearning. Methods for discovering hyperparameters, topologies, and learning rate schedules have lead to significant increases in performance. This dissertation tackles a new type of metalearning: loss-function optimization. Loss functions define a model's core training objective and thus present a clear opportunity. Two techniques, GLO and TaylorGLO, were developed to tackle this metalearning problem using genetic programming and evolutionary strategies. Experiments show that neural networks trained with metalearned loss functions are more accurate, have higher data utilization, train faster, and are more robust against adversarial attacks. A theoretical framework was developed to analyze how and why different loss functions bias training towards different regions of the parameter space. Using this framework, their performance gains are found to result from a regularizing effect that is tailored to each domain. Overall, this dissertation demonstrates that new, metalearned loss functions can result in better trained models, and provides the next stepping stone towards fully automated machine learning.Computer Science
Recommended from our members
Biological learning in key-value memory networks
In neuroscience, classical Hopfield networks are the standard biologically plausible model of long-term memory, relying on Hebbian plasticity for storage and attractor dynamics for recall. In contrast, memory-augmented neural networks in machine learning commonly use a key-value mechanism to store and read out memories in a single step. Such augmented networks achieve impressive feats of memory compared to traditional variants, yet it remains unclear whether they can be implemented by biological systems. In our work, we bridge this gap by proposing a set of of biologically plausible three-factor plasticity rules for a basic feedforward key-value memory network. Keys are stored in the input-to-hidden synaptic weights by a "non-Hebbian" rule, controlled only by pre-synaptic activity, and modulated by local third factors which represent dendtritic spikes. Values are stored in the hidden-to-output weights by a Hebbian rule, with the pre-synaptic neuron selected through softmax attention which represents recurrent inhibition. The same rules are recovered when network parameters are meta-learned. Our network performs on par with classical Hopfield networks on autoassociative memory tasks and can be naturally extended to correlated inputs, continual recall, heteroassociative memory, and sequence learning. Importantly, since memories are stored in slots indexed by hidden layer neurons, unlike the fully distributed representation in the classical Hopfield network, they can be individually selected for extended storage or rapid decay. Finally, our memory network can easily be incorporated into a larger neural system, either as a memory bank for an external controller, or as a fast learning system used in conjunction with a slow one. Overall, our results suggest a compelling alternative to the classical Hopfield network as a model of biological long-term memory.
Keywords: learning, memory, synaptic plasticity, Hebbian, key-value memory, neural network, three-factor plasticit