34 research outputs found
PyTorch-Hebbian : facilitating local learning in a deep learning framework
Recently, unsupervised local learning, based on Hebb's idea that change in
synaptic efficacy depends on the activity of the pre- and postsynaptic neuron
only, has shown potential as an alternative training mechanism to
backpropagation. Unfortunately, Hebbian learning remains experimental and
rarely makes it way into standard deep learning frameworks. In this work, we
investigate the potential of Hebbian learning in the context of standard deep
learning workflows. To this end, a framework for thorough and systematic
evaluation of local learning rules in existing deep learning pipelines is
proposed. Using this framework, the potential of Hebbian learned feature
extractors for image classification is illustrated. In particular, the
framework is used to expand the Krotov-Hopfield learning rule to standard
convolutional neural networks without sacrificing accuracy compared to
end-to-end backpropagation. The source code is available at
https://github.com/Joxis/pytorch-hebbian.Comment: Presented as a poster at the NeurIPS 2020 Beyond Backpropagation
worksho
Improving equilibrium propagation without weight symmetry through Jacobian homeostasis
Equilibrium propagation (EP) is a compelling alternative to the
backpropagation of error algorithm (BP) for computing gradients of neural
networks on biological or analog neuromorphic substrates. Still, the algorithm
requires weight symmetry and infinitesimal equilibrium perturbations, i.e.,
nudges, to estimate unbiased gradients efficiently. Both requirements are
challenging to implement in physical systems. Yet, whether and how weight
asymmetry affects its applicability is unknown because, in practice, it may be
masked by biases introduced through the finite nudge. To address this question,
we study generalized EP, which can be formulated without weight symmetry, and
analytically isolate the two sources of bias. For complex-differentiable
non-symmetric networks, we show that the finite nudge does not pose a problem,
as exact derivatives can still be estimated via a Cauchy integral. In contrast,
weight asymmetry introduces bias resulting in low task performance due to poor
alignment of EP's neuronal error vectors compared to BP. To mitigate this
issue, we present a new homeostatic objective that directly penalizes
functional asymmetries of the Jacobian at the network's fixed point. This
homeostatic objective dramatically improves the network's ability to solve
complex tasks such as ImageNet 32x32. Our results lay the theoretical
groundwork for studying and mitigating the adverse effects of imperfections of
physical networks on learning algorithms that rely on the substrate's
relaxation dynamics
Backpropagation at the Infinitesimal Inference Limit of Energy-Based Models: Unifying Predictive Coding, Equilibrium Propagation, and Contrastive Hebbian Learning
How the brain performs credit assignment is a fundamental unsolved problem in
neuroscience. Many `biologically plausible' algorithms have been proposed,
which compute gradients that approximate those computed by backpropagation
(BP), and which operate in ways that more closely satisfy the constraints
imposed by neural circuitry. Many such algorithms utilize the framework of
energy-based models (EBMs), in which all free variables in the model are
optimized to minimize a global energy function. However, in the literature,
these algorithms exist in isolation and no unified theory exists linking them
together. Here, we provide a comprehensive theory of the conditions under which
EBMs can approximate BP, which lets us unify many of the BP approximation
results in the literature (namely, predictive coding, equilibrium propagation,
and contrastive Hebbian learning) and demonstrate that their approximation to
BP arises from a simple and general mathematical property of EBMs at free-phase
equilibrium. This property can then be exploited in different ways with
different energy functions, and these specific choices yield a family of
BP-approximating algorithms, which both includes the known results in the
literature and can be used to derive new ones.Comment: 31/05/22 initial upload; 22/06/22 change corresponding author;
03/08/22 revision
A contrastive rule for meta-learning
Meta-learning algorithms leverage regularities that are present on a set of tasks to speed up and improve the performance of a subsidiary learning process. Recent work on deep neural networks has shown that prior gradient-based learning of meta-parameters can greatly improve the efficiency of subsequent learning. Here, we present a biologically plausible meta-learning algorithm based on equilibrium propagation. Instead of explicitly differentiating the learning process, our contrastive meta-learning rule estimates meta-parameter gradients by executing the subsidiary process more than once. This avoids reversing the learning dynamics in time and computing second-order derivatives. In spite of this, and unlike previous first-order methods, our rule recovers an arbitrarily accurate meta-parameter update given enough compute. We establish theoretical bounds on its performance and present experiments on a set of standard benchmarks and neural network architectures
Understanding and Improving Optimization in Predictive Coding Networks
Backpropagation (BP), the standard learning algorithm for artificial neural
networks, is often considered biologically implausible. In contrast, the
standard learning algorithm for predictive coding (PC) models in neuroscience,
known as the inference learning algorithm (IL), is a promising, bio-plausible
alternative. However, several challenges and questions hinder IL's application
to real-world problems. For example, IL is computationally demanding, and
without memory-intensive optimizers like Adam, IL may converge to poor local
minima. Moreover, although IL can reduce loss more quickly than BP, the reasons
for these speedups or their robustness remains unclear. In this paper, we
tackle these challenges by 1) altering the standard implementation of PC
circuits to substantially reduce computation, 2) developing a novel optimizer
that improves the convergence of IL without increasing memory usage, and 3)
establishing theoretical results that help elucidate the conditions under which
IL is sensitive to second and higher-order information
A deep learning theory for neural networks grounded in physics
Au cours de la dernière décennie, l'apprentissage profond est devenu une composante majeure de l'intelligence artificielle, ayant mené à une série d'avancées capitales dans une variété de domaines. L'un des piliers de l'apprentissage profond est l'optimisation de fonction de coût par l'algorithme du gradient stochastique (SGD). Traditionnellement en apprentissage profond, les réseaux de neurones sont des fonctions mathématiques différentiables, et les gradients requis pour l'algorithme SGD sont calculés par rétropropagation. Cependant, les architectures informatiques sur lesquelles ces réseaux de neurones sont implémentés et entraînés souffrent d’inefficacités en vitesse et en énergie, dues à la séparation de la mémoire et des calculs dans ces architectures. Pour résoudre ces problèmes, le neuromorphique vise à implementer les réseaux de neurones dans des architectures qui fusionnent mémoire et calculs, imitant plus fidèlement le cerveau. Dans cette thèse, nous soutenons que pour construire efficacement des réseaux de neurones dans des architectures neuromorphiques, il est nécessaire de repenser les algorithmes pour les implémenter et les entraîner. Nous présentons un cadre mathématique alternative, compatible lui aussi avec l’algorithme SGD, qui permet de concevoir des réseaux de neurones dans des substrats qui exploitent mieux les lois de la physique. Notre cadre mathématique s'applique à une très large classe de modèles, à savoir les systèmes dont l'état ou la dynamique sont décrits par des équations variationnelles. La procédure pour calculer les gradients de la fonction de coût dans de tels systèmes (qui dans de nombreux cas pratiques ne nécessite que de l'information locale pour chaque paramètre) est appelée “equilibrium propagation” (EqProp). Comme beaucoup de systèmes en physique et en ingénierie peuvent être décrits par des principes variationnels, notre cadre mathématique peut potentiellement s'appliquer à une grande variété de systèmes physiques, dont les applications vont au delà du neuromorphique et touchent divers champs d'ingénierie.In the last decade, deep learning has become a major component of artificial intelligence, leading to a series of breakthroughs across a wide variety of domains. The workhorse of deep learning is the optimization of loss functions by stochastic gradient descent (SGD). Traditionally in deep learning, neural networks are differentiable mathematical functions, and the loss gradients required for SGD are computed with the backpropagation algorithm. However, the computer architectures on which these neural networks are implemented and trained suffer from speed and energy inefficiency issues, due to the separation of memory and processing in these architectures. To solve these problems, the field of neuromorphic computing aims at implementing neural networks on hardware architectures that merge memory and processing, just like brains do. In this thesis, we argue that building large, fast and efficient neural networks on neuromorphic architectures also requires rethinking the algorithms to implement and train them. We present an alternative mathematical framework, also compatible with SGD, which offers the possibility to design neural networks in substrates that directly exploit the laws of physics. Our framework applies to a very broad class of models, namely those whose state or dynamics are described by variational equations. This includes physical systems whose equilibrium state minimizes an energy function, and physical systems whose trajectory minimizes an action functional (principle of least action). We present a simple procedure to compute the loss gradients in such systems, called equilibrium propagation (EqProp), which requires solely locally available information for each trainable parameter. Since many models in physics and engineering can be described by variational principles, our framework has the potential to be applied to a broad variety of physical systems, whose applications extend to various fields of engineering, beyond neuromorphic computing