27 research outputs found

    Gated Linear Networks

    Full text link
    This paper presents a new family of backpropagation-free neural architectures, Gated Linear Networks (GLNs). What distinguishes GLNs from contemporary neural networks is the distributed and local nature of their credit assignment mechanism; each neuron directly predicts the target, forgoing the ability to learn feature representations in favor of rapid online learning. Individual neurons can model nonlinear functions via the use of data-dependent gating in conjunction with online convex optimization. We show that this architecture gives rise to universal learning capabilities in the limit, with effective model capacity increasing as a function of network size in a manner comparable with deep ReLU networks. Furthermore, we demonstrate that the GLN learning mechanism possesses extraordinary resilience to catastrophic forgetting, performing comparably to a MLP with dropout and Elastic Weight Consolidation on standard benchmarks. These desirable theoretical and empirical properties position GLNs as a complementary technique to contemporary offline deep learning methods.Comment: arXiv admin note: substantial text overlap with arXiv:1712.0189

    Extending Gated Linear Networks for Continual Learning

    Get PDF
    To incrementally learn multiple tasks from an indefinitely long stream of data is a real challenge for traditional machine learning models. If not carefully controlled, the learning of new knowledge strongly impacts on a model’s learned abilities, making it to forget how to solve past tasks. Continual learning faces this problem, called catastrophic forgetting, developing models able to continually learn new tasks and adapt to changes in the data distribution. In this dissertation, we consider the recently proposed family of continual learning models, called Gated Linear Networks (GLNs), and study two crucial aspects impacting on the amount of catastrophic forgetting affecting gated linear networks, namely, data standardization and gating mechanism. Data standardization is particularly challenging in the online/continual learning setting because data from future tasks is not available beforehand. The results obtained using an online standardization method show a considerably higher amount of forgetting compared to an offline –static– standardization. Interestingly, with the latter standardization, we observe that GLNs show almost no forgetting on the considered benchmark datasets. Secondly, for an effective GLNs, it is essential to tailor the hyperparameters of the gating mechanism to the data distribution. In this dissertation, we propose a gating strategy based on a set of prototypes and the resulting Voronoi tessellation. The experimental assessment shows that, in an ideal setting where the data distribution is known, the proposed approach is more robust to different data standardizations compared to the original one, based on a halfspace gating mechanism, and shows improved predictive performance. Finally, we propose an adaptive mechanism for the choice of prototypes, which expands and shrinks the set of prototypes in an online fashion, making the model suitable for practical continual learning applications. The experimental results show that the adaptive model performances are close to the ideal scenario where prototypes are directly sampled from the data distribution.To incrementally learn multiple tasks from an indefinitely long stream of data is a real challenge for traditional machine learning models. If not carefully controlled, the learning of new knowledge strongly impacts on a model’s learned abilities, making it to forget how to solve past tasks. Continual learning faces this problem, called catastrophic forgetting, developing models able to continually learn new tasks and adapt to changes in the data distribution. In this dissertation, we consider the recently proposed family of continual learning models, called Gated Linear Networks (GLNs), and study two crucial aspects impacting on the amount of catastrophic forgetting affecting gated linear networks, namely, data standardization and gating mechanism. Data standardization is particularly challenging in the online/continual learning setting because data from future tasks is not available beforehand. The results obtained using an online standardization method show a considerably higher amount of forgetting compared to an offline –static– standardization. Interestingly, with the latter standardization, we observe that GLNs show almost no forgetting on the considered benchmark datasets. Secondly, for an effective GLNs, it is essential to tailor the hyperparameters of the gating mechanism to the data distribution. In this dissertation, we propose a gating strategy based on a set of prototypes and the resulting Voronoi tessellation. The experimental assessment shows that, in an ideal setting where the data distribution is known, the proposed approach is more robust to different data standardizations compared to the original one, based on a halfspace gating mechanism, and shows improved predictive performance. Finally, we propose an adaptive mechanism for the choice of prototypes, which expands and shrinks the set of prototypes in an online fashion, making the model suitable for practical continual learning applications. The experimental results show that the adaptive model performances are close to the ideal scenario where prototypes are directly sampled from the data distribution

    Gated Linear Networks for Continual Learning in a Class-Incremental with Repetition Scenario

    Get PDF
    Il continual learning, che comporta l'acquisizione incrementale di conoscenze nel tempo, è un problema impegnativo in ambienti complessi in cui la distribuzione dei dati può cambiare nel tempo. Nonostante i grandi risultati ottenuti dalle reti neurali nel risolvere una grande varietà di compiti, fanno ancora fatica a raggiungere le stesse buone prestazioni in un ambiente di apprendimento continuo, soffrendo di un problema noto come catastrophic forgetting. Questo problema, che consiste nella tendenza di un modello a sovrascrivere vecchie informazioni quando ne vengono presentate di nuove, è stato affrontato attraverso una varietà di strategie che adattano il modello in diversi modi. Tra queste, in questo lavoro ci concentreremo sulle Gated Linear Networks (GLN), un tipo di modelli che si basano su un meccanismo di gating per migliorare l'archiviazione e il recupero delle informazioni nel tempo. Questa classe di modelli è già stata applicata al continual learning con risultati finora promettenti, ma sempre in framework estremamente semplificati. In questo lavoro cercheremo di definire un ambiente di apprendimento continuo più complesso e di adattare i GLN alle crescenti sfide che questo ambiente presenterà, valutandone i punti di forza ed i limiti. In particolare, abbiamo scoperto che la presenza di una fase di encoding può aiutare a rendere un set di dati complessi più spazialmente separabile e quindi rendere i GLN più efficaci, e che il passaggio a uno scenario Class-Incremental con Ripetizioni è utile sia per aumentare il realismo del framework sia per facilitare l'apprendimento.Continual learning, which involves the incremental acquisition of knowledge over time, is a challenging problem in complex environments where the distribution of data may change over time. Despite the great results obtained by neural networks in solving a great variety of tasks they still struggle in showing the same strong performance in a continual learning environment, suffering from a problem known as catastrophic forgetting. This problem, that consists in a model's tendency to overwrite old knowledge when new one is presented, has been dealt with through a variety of strategies that adapt the models on different levels. Among those, in this work we will focus on Gated Linear Networks (GLNs), a type of models that rely on a gating mechanism to improve the storage and retrieval of information over time. This class of models has already been applied to continual learning with promising results, but always in extremely simplified frameworks. In this work we will try to define a more complex continual learning environment and to adapt GLNs to the increased challenges that this environment will present, evaluating their strengths and their limitations. In particular, we found that performing an encoding step can help making a complex dataset more spatially separable and therefore making the GLNs more effective, and that switching to a Class-Incremental with Repetition scenario is useful both to increase the realism of the framework while easing the learning difficulty

    Globally Gated Deep Linear Networks

    Full text link
    Recently proposed Gated Linear Networks present a tractable nonlinear network architecture, and exhibit interesting capabilities such as learning with local error signals and reduced forgetting in sequential learning. In this work, we introduce a novel gating architecture, named Globally Gated Deep Linear Networks (GGDLNs) where gating units are shared among all processing units in each layer, thereby decoupling the architectures of the nonlinear but unlearned gatings and the learned linear processing motifs. We derive exact equations for the generalization properties in these networks in the finite-width thermodynamic limit, defined by P,N→∞,P/N∼O(1)P,N\rightarrow\infty, P/N\sim O(1), where P and N are the training sample size and the network width respectively. We find that the statistics of the network predictor can be expressed in terms of kernels that undergo shape renormalization through a data-dependent matrix compared to the GP kernels. Our theory accurately captures the behavior of finite width GGDLNs trained with gradient descent dynamics. We show that kernel shape renormalization gives rise to rich generalization properties w.r.t. network width, depth and L2 regularization amplitude. Interestingly, networks with sufficient gating units behave similarly to standard ReLU networks. Although gatings in the model do not participate in supervised learning, we show the utility of unsupervised learning of the gating parameters. Additionally, our theory allows the evaluation of the network's ability for learning multiple tasks by incorporating task-relevant information into the gating units. In summary, our work is the first exact theoretical solution of learning in a family of nonlinear networks with finite width. The rich and diverse behavior of the GGDLNs suggests that they are helpful analytically tractable models of learning single and multiple tasks, in finite-width nonlinear deep networks

    PyTorch-Hebbian : facilitating local learning in a deep learning framework

    Get PDF
    Recently, unsupervised local learning, based on Hebb's idea that change in synaptic efficacy depends on the activity of the pre- and postsynaptic neuron only, has shown potential as an alternative training mechanism to backpropagation. Unfortunately, Hebbian learning remains experimental and rarely makes it way into standard deep learning frameworks. In this work, we investigate the potential of Hebbian learning in the context of standard deep learning workflows. To this end, a framework for thorough and systematic evaluation of local learning rules in existing deep learning pipelines is proposed. Using this framework, the potential of Hebbian learned feature extractors for image classification is illustrated. In particular, the framework is used to expand the Krotov-Hopfield learning rule to standard convolutional neural networks without sacrificing accuracy compared to end-to-end backpropagation. The source code is available at https://github.com/Joxis/pytorch-hebbian.Comment: Presented as a poster at the NeurIPS 2020 Beyond Backpropagation worksho
    corecore