27 research outputs found
Gated Linear Networks
This paper presents a new family of backpropagation-free neural
architectures, Gated Linear Networks (GLNs). What distinguishes GLNs from
contemporary neural networks is the distributed and local nature of their
credit assignment mechanism; each neuron directly predicts the target, forgoing
the ability to learn feature representations in favor of rapid online learning.
Individual neurons can model nonlinear functions via the use of data-dependent
gating in conjunction with online convex optimization. We show that this
architecture gives rise to universal learning capabilities in the limit, with
effective model capacity increasing as a function of network size in a manner
comparable with deep ReLU networks. Furthermore, we demonstrate that the GLN
learning mechanism possesses extraordinary resilience to catastrophic
forgetting, performing comparably to a MLP with dropout and Elastic Weight
Consolidation on standard benchmarks. These desirable theoretical and empirical
properties position GLNs as a complementary technique to contemporary offline
deep learning methods.Comment: arXiv admin note: substantial text overlap with arXiv:1712.0189
Extending Gated Linear Networks for Continual Learning
To incrementally learn multiple tasks from an indefinitely long stream of data
is a real challenge for traditional machine learning models. If not carefully
controlled, the learning of new knowledge strongly impacts on a model’s learned
abilities, making it to forget how to solve past tasks.
Continual learning faces this problem, called catastrophic forgetting, developing
models able to continually learn new tasks and adapt to changes in the
data distribution.
In this dissertation, we consider the recently proposed family of continual
learning models, called Gated Linear Networks (GLNs), and study two crucial
aspects impacting on the amount of catastrophic forgetting affecting gated linear
networks, namely, data standardization and gating mechanism.
Data standardization is particularly challenging in the online/continual learning
setting because data from future tasks is not available beforehand. The
results obtained using an online standardization method show a considerably
higher amount of forgetting compared to an offline –static– standardization.
Interestingly, with the latter standardization, we observe that GLNs show almost
no forgetting on the considered benchmark datasets.
Secondly, for an effective GLNs, it is essential to tailor the hyperparameters
of the gating mechanism to the data distribution. In this dissertation, we propose
a gating strategy based on a set of prototypes and the resulting Voronoi
tessellation. The experimental assessment shows that, in an ideal setting where
the data distribution is known, the proposed approach is more robust to different
data standardizations compared to the original one, based on a halfspace
gating mechanism, and shows improved predictive performance.
Finally, we propose an adaptive mechanism for the choice of prototypes,
which expands and shrinks the set of prototypes in an online fashion, making the
model suitable for practical continual learning applications. The experimental
results show that the adaptive model performances are close to the ideal scenario
where prototypes are directly sampled from the data distribution.To incrementally learn multiple tasks from an indefinitely long stream of data
is a real challenge for traditional machine learning models. If not carefully
controlled, the learning of new knowledge strongly impacts on a model’s learned
abilities, making it to forget how to solve past tasks.
Continual learning faces this problem, called catastrophic forgetting, developing
models able to continually learn new tasks and adapt to changes in the
data distribution.
In this dissertation, we consider the recently proposed family of continual
learning models, called Gated Linear Networks (GLNs), and study two crucial
aspects impacting on the amount of catastrophic forgetting affecting gated linear
networks, namely, data standardization and gating mechanism.
Data standardization is particularly challenging in the online/continual learning
setting because data from future tasks is not available beforehand. The
results obtained using an online standardization method show a considerably
higher amount of forgetting compared to an offline –static– standardization.
Interestingly, with the latter standardization, we observe that GLNs show almost
no forgetting on the considered benchmark datasets.
Secondly, for an effective GLNs, it is essential to tailor the hyperparameters
of the gating mechanism to the data distribution. In this dissertation, we propose
a gating strategy based on a set of prototypes and the resulting Voronoi
tessellation. The experimental assessment shows that, in an ideal setting where
the data distribution is known, the proposed approach is more robust to different
data standardizations compared to the original one, based on a halfspace
gating mechanism, and shows improved predictive performance.
Finally, we propose an adaptive mechanism for the choice of prototypes,
which expands and shrinks the set of prototypes in an online fashion, making the
model suitable for practical continual learning applications. The experimental
results show that the adaptive model performances are close to the ideal scenario
where prototypes are directly sampled from the data distribution
Gated Linear Networks for Continual Learning in a Class-Incremental with Repetition Scenario
Il continual learning, che comporta l'acquisizione incrementale di conoscenze nel tempo, è un problema impegnativo in ambienti complessi in cui la distribuzione dei dati può cambiare nel tempo. Nonostante i grandi risultati ottenuti dalle reti neurali nel risolvere una grande varietà di compiti, fanno ancora fatica a raggiungere le stesse buone prestazioni in un ambiente di apprendimento continuo, soffrendo di un problema noto come catastrophic forgetting. Questo problema, che consiste nella tendenza di un modello a sovrascrivere vecchie informazioni quando ne vengono presentate di nuove, è stato affrontato attraverso una varietà di strategie che adattano il modello in diversi modi. Tra queste, in questo lavoro ci concentreremo sulle Gated Linear Networks (GLN), un tipo di modelli che si basano su un meccanismo di gating per migliorare l'archiviazione e il recupero delle informazioni nel tempo. Questa classe di modelli è già stata applicata al continual learning con risultati finora promettenti, ma sempre in framework estremamente semplificati. In questo lavoro cercheremo di definire un ambiente di apprendimento continuo più complesso e di adattare i GLN alle crescenti sfide che questo ambiente presenterà , valutandone i punti di forza ed i limiti. In particolare, abbiamo scoperto che la presenza di una fase di encoding può aiutare a rendere un set di dati complessi più spazialmente separabile e quindi rendere i GLN più efficaci, e che il passaggio a uno scenario Class-Incremental con Ripetizioni è utile sia per aumentare il realismo del framework sia per facilitare l'apprendimento.Continual learning, which involves the incremental acquisition of knowledge over time, is a challenging problem in complex environments where the distribution of data may change over time. Despite the great results obtained by neural networks in solving a great variety of tasks they still struggle in showing the same strong performance in a continual learning environment, suffering from a problem known as catastrophic forgetting. This problem, that consists in a model's tendency to overwrite old knowledge when new one is presented, has been dealt with through a variety of strategies that adapt the models on different levels. Among those, in this work we will focus on Gated Linear Networks (GLNs), a type of models that rely on a gating mechanism to improve the storage and retrieval of information over time. This class of models has already been applied to continual learning with promising results, but always in extremely simplified frameworks. In this work we will try to define a more complex continual learning environment and to adapt GLNs to the increased challenges that this environment will present, evaluating their strengths and their limitations. In particular, we found that performing an encoding step can help making a complex dataset more spatially separable and therefore making the GLNs more effective, and that switching to a Class-Incremental with Repetition scenario is useful both to increase the realism of the framework while easing the learning difficulty
Globally Gated Deep Linear Networks
Recently proposed Gated Linear Networks present a tractable nonlinear network
architecture, and exhibit interesting capabilities such as learning with local
error signals and reduced forgetting in sequential learning. In this work, we
introduce a novel gating architecture, named Globally Gated Deep Linear
Networks (GGDLNs) where gating units are shared among all processing units in
each layer, thereby decoupling the architectures of the nonlinear but unlearned
gatings and the learned linear processing motifs. We derive exact equations for
the generalization properties in these networks in the finite-width
thermodynamic limit, defined by , where P
and N are the training sample size and the network width respectively. We find
that the statistics of the network predictor can be expressed in terms of
kernels that undergo shape renormalization through a data-dependent matrix
compared to the GP kernels. Our theory accurately captures the behavior of
finite width GGDLNs trained with gradient descent dynamics. We show that kernel
shape renormalization gives rise to rich generalization properties w.r.t.
network width, depth and L2 regularization amplitude. Interestingly, networks
with sufficient gating units behave similarly to standard ReLU networks.
Although gatings in the model do not participate in supervised learning, we
show the utility of unsupervised learning of the gating parameters.
Additionally, our theory allows the evaluation of the network's ability for
learning multiple tasks by incorporating task-relevant information into the
gating units. In summary, our work is the first exact theoretical solution of
learning in a family of nonlinear networks with finite width. The rich and
diverse behavior of the GGDLNs suggests that they are helpful analytically
tractable models of learning single and multiple tasks, in finite-width
nonlinear deep networks
PyTorch-Hebbian : facilitating local learning in a deep learning framework
Recently, unsupervised local learning, based on Hebb's idea that change in
synaptic efficacy depends on the activity of the pre- and postsynaptic neuron
only, has shown potential as an alternative training mechanism to
backpropagation. Unfortunately, Hebbian learning remains experimental and
rarely makes it way into standard deep learning frameworks. In this work, we
investigate the potential of Hebbian learning in the context of standard deep
learning workflows. To this end, a framework for thorough and systematic
evaluation of local learning rules in existing deep learning pipelines is
proposed. Using this framework, the potential of Hebbian learned feature
extractors for image classification is illustrated. In particular, the
framework is used to expand the Krotov-Hopfield learning rule to standard
convolutional neural networks without sacrificing accuracy compared to
end-to-end backpropagation. The source code is available at
https://github.com/Joxis/pytorch-hebbian.Comment: Presented as a poster at the NeurIPS 2020 Beyond Backpropagation
worksho