329 research outputs found
Towards Continual Reinforcement Learning: A Review and Perspectives
In this article, we aim to provide a literature review of different
formulations and approaches to continual reinforcement learning (RL), also
known as lifelong or non-stationary RL. We begin by discussing our perspective
on why RL is a natural fit for studying continual learning. We then provide a
taxonomy of different continual RL formulations and mathematically characterize
the non-stationary dynamics of each setting. We go on to discuss evaluation of
continual RL agents, providing an overview of benchmarks used in the literature
and important metrics for understanding agent performance. Finally, we
highlight open problems and challenges in bridging the gap between the current
state of continual RL and findings in neuroscience. While still in its early
days, the study of continual RL has the promise to develop better incremental
reinforcement learners that can function in increasingly realistic applications
where non-stationarity plays a vital role. These include applications such as
those in the fields of healthcare, education, logistics, and robotics.Comment: Preprint, 52 pages, 8 figure
Continual Lifelong Learning with Neural Networks: A Review
Humans and animals have the ability to continually acquire, fine-tune, and
transfer knowledge and skills throughout their lifespan. This ability, referred
to as lifelong learning, is mediated by a rich set of neurocognitive mechanisms
that together contribute to the development and specialization of our
sensorimotor skills as well as to long-term memory consolidation and retrieval.
Consequently, lifelong learning capabilities are crucial for autonomous agents
interacting in the real world and processing continuous streams of information.
However, lifelong learning remains a long-standing challenge for machine
learning and neural network models since the continual acquisition of
incrementally available information from non-stationary data distributions
generally leads to catastrophic forgetting or interference. This limitation
represents a major drawback for state-of-the-art deep neural network models
that typically learn representations from stationary batches of training data,
thus without accounting for situations in which information becomes
incrementally available over time. In this review, we critically summarize the
main challenges linked to lifelong learning for artificial learning systems and
compare existing neural network approaches that alleviate, to different
extents, catastrophic forgetting. We discuss well-established and emerging
research motivated by lifelong learning factors in biological systems such as
structural plasticity, memory replay, curriculum and transfer learning,
intrinsic motivation, and multisensory integration
Do not bet on the unknown versus try to find out more: estimation uncertainty and “unexpected uncertainty” both modulate exploration
Little is known about how humans solve the exploitation/exploration trade-off. In particular, the evidence for uncertainty-driven exploration is mixed. The current study proposes a novel hypothesis of exploration that helps reconcile prior findings that may seem contradictory at first. According to this hypothesis, uncertainty-driven exploration involves a dilemma between two motives: (i) to speed up learning about the unknown, which may beget novel reward opportunities; (ii) to avoid the unknown because it is potentially dangerous. We provide evidence for our hypothesis using both behavioral and simulated data, and briefly point to recent evidence that the brain differentiates between these two motives
Learning with Surprise:Theory and Applications
Everybody knows what it feels to be surprised. Surprise raises our attention and is crucial for learning. It is a ubiquitous concept whose traces have been found in both neuroscience and machine learning. However, a comprehensive theory has not yet been developed that addresses fundamental problems about surprise: (1) surprise is difficult to quantify. How should we measure the level of surprise when we encounter an unexpected event? What is the link between surprise and startle responses in behavioral biology? (2) the key role of surprise in learning is somewhat unclear. We believe that surprise drives attention and modifies learning; but, how should surprise be incorporated, in general paradigms of learning? and (3) can we develop a biologically plausible theory that explains how surprise can be neurally calculated and implemented in the brain? I propose a theoretical framework to address the above issues about surprise. There are three components to this framework: (1) a subjective confidence-adjusted measure of surprise, that can be used for quantification purposes, (2) a surprise-minimization learning rule that models the role of surprise in learning by balancing the relative contribution of new and old data for inference about the world, and (3) a surprise-modulated Hebbian plasticity rule that can be implemented in both artificial and spiking neural networks. The proposed online rule links surprise to the activity of the neuromodulatory system in the brain, and belongs to the class of neo-Hebbian plasticity rules. My work on the foundations of surprise provides a suitable framework for future studies on learning with surprise. Reinforcement learning methods can be enhanced by incorporating the proposed theory of surprise. The theory could ultimately become interesting for the analysis of fMRI and EEG data. It may also inspire new synaptic plasticity rules that are under the simultaneous control of reward and surprise. Moreover, the proposed theory can be used to make testable predictions about the time course of the neural substrate of surprise (e.g., noradrenaline), and suggests behavioral experiments that can be performed on real animals for studying surprise-related neural activity
Comparative evaluation of approaches in T.4.1-4.3 and working definition of adaptive module
The goal of this deliverable is two-fold: (1) to present and compare different approaches towards learning and encoding movements us- ing dynamical systems that have been developed by the AMARSi partners (in the past during the first 6 months of the project), and (2) to analyze their suitability to be used as adaptive modules, i.e. as building blocks for the complete architecture that will be devel- oped in the project. The document presents a total of eight approaches, in two groups: modules for discrete movements (i.e. with a clear goal where the movement stops) and for rhythmic movements (i.e. which exhibit periodicity). The basic formulation of each approach is presented together with some illustrative simulation results. Key character- istics such as the type of dynamical behavior, learning algorithm, generalization properties, stability analysis are then discussed for each approach. We then make a comparative analysis of the different approaches by comparing these characteristics and discussing their suitability for the AMARSi project
The brain as a generative model: information-theoretic surprise in learning and action
Our environment is rich with statistical regularities, such as a sudden cold gust of wind indicating a potential change in weather. A combination of theoretical work and empirical evidence suggests that humans embed this information in an internal representation of the world. This generative model is used to perform probabilistic inference, which may be approximated through surprise minimization. This process rests on current beliefs enabling predictions, with expectation violation amounting to surprise. Through repeated interaction with the world, beliefs become more accurate and grow more certain over time. Perception and learning may be accounted for by minimizing surprise of current observations, while action is proposed to minimize expected surprise of future events. This framework thus shows promise as a common formulation for different brain functions.
The work presented here adopts information-theoretic quantities of surprise to investigate both perceptual learning and action. We recorded electroencephalography (EEG) of participants in a somatosensory roving-stimulus paradigm and performed trial-by-trial modeling of cortical dynamics. Bayesian model selection suggests early processing in somatosensory cortices to encode confidence-corrected surprise and subsequently Bayesian surprise. This suggests the somatosensory system to signal surprise of observations and update a probabilistic model learning transition probabilities. We also extended this framework to include audition and vision in a multi-modal roving-stimulus study. Next, we studied action by investigating a sensitivity to expected Bayesian surprise. Interestingly, this quantity is also known as information gain and arises as an incentive to reduce uncertainty in the active inference framework, which can correspond to surprise minimization. In comparing active inference to a classical reinforcement learning model on the two-step decision-making task, we provided initial evidence for active inference to better account for human model-based behaviour. This appeared to relate to participants’ sensitivity to expected Bayesian surprise and contributed to explaining exploration behaviour not accounted for by the reinforcement learning model. Overall, our findings provide evidence for information-theoretic surprise as a model for perceptual learning signals while also guiding human action.Unsere Umwelt ist reich an statistischen Regelmäßigkeiten, wie z. B. ein plötzlicher kalter Windstoß, der einen möglichen Wetterumschwung ankündigt. Eine Kombination aus theoretischen Arbeiten und empirischen Erkenntnissen legt nahe, dass der Mensch diese Informationen in eine interne Darstellung der Welt einbettet. Dieses generative Modell wird verwendet, um probabilistische Inferenz durchzuführen, die durch Minimierung von Überraschungen angenähert werden kann. Der Prozess beruht auf aktuellen Annahmen, die Vorhersagen ermöglichen, wobei eine Verletzung der Erwartungen einer Überraschung gleichkommt. Durch wiederholte Interaktion mit der Welt nehmen die Annahmen mit der Zeit an Genauigkeit und Gewissheit zu. Es wird angenommen, dass Wahrnehmung und Lernen durch die Minimierung von Überraschungen bei aktuellen Beobachtungen erklärt werden können, während Handlung erwartete Überraschungen für zukünftige Beobachtungen minimiert. Dieser Rahmen ist daher als gemeinsame Bezeichnung für verschiedene Gehirnfunktionen vielversprechend.
In der hier vorgestellten Arbeit werden informationstheoretische Größen der Überraschung verwendet, um sowohl Wahrnehmungslernen als auch Handeln zu untersuchen. Wir haben die Elektroenzephalographie (EEG) von Teilnehmern in einem somatosensorischen Paradigma aufgezeichnet und eine trial-by-trial Modellierung der kortikalen Dynamik durchgeführt. Die Bayes'sche Modellauswahl deutet darauf hin, dass frühe Verarbeitung in den somatosensorischen Kortizes confidence corrected surprise und Bayesian surprise kodiert. Dies legt nahe, dass das somatosensorische System die Überraschung über Beobachtungen signalisiert und ein probabilistisches Modell aktualisiert, welches wiederum Wahrscheinlichkeiten in Bezug auf Übergänge zwischen Reizen lernt. In einer weiteren multimodalen Roving-Stimulus-Studie haben wir diesen Rahmen auch auf die auditorische und visuelle Modalität ausgeweitet. Als Nächstes untersuchten wir Handlungen, indem wir die Empfindlichkeit gegenüber der erwarteten Bayesian surprise betrachteten. Interessanterweise ist diese informationstheoretische Größe auch als Informationsgewinn bekannt und stellt, im Rahmen von active inference, einen Anreiz dar, Unsicherheit zu reduzieren. Dies wiederum kann einer Minimierung der Überraschung entsprechen. Durch den Vergleich von active inference mit einem klassischen Modell des Verstärkungslernens (reinforcement learning) bei der zweistufigen Entscheidungsaufgabe konnten wir erste Belege dafür liefern, dass active inference menschliches modellbasiertes Verhalten besser abbildet. Dies scheint mit der Sensibilität der Teilnehmer gegenüber der erwarteten Bayesian surprise zusammenzuhängen und trägt zur Erklärung des Explorationsverhaltens bei, das jedoch nicht vom reinforcement learning-Modell erklärt werden kann. Insgesamt liefern unsere Ergebnisse Hinweise für Formulierungen der informationstheoretischen Überraschung als Modell für Signale wahrnehmungsbasierten Lernens, die auch menschliches Handeln steuern
Neuromorphic Engineering Editors' Pick 2021
This collection showcases well-received spontaneous articles from the past couple of years, which have been specially handpicked by our Chief Editors, Profs. André van Schaik and Bernabé Linares-Barranco. The work presented here highlights the broad diversity of research performed across the section and aims to put a spotlight on the main areas of interest. All research presented here displays strong advances in theory, experiment, and methodology with applications to compelling problems. This collection aims to further support Frontiers’ strong community by recognizing highly deserving authors
NPCL: Neural Processes for Uncertainty-Aware Continual Learning
Continual learning (CL) aims to train deep neural networks efficiently on
streaming data while limiting the forgetting caused by new tasks. However,
learning transferable knowledge with less interference between tasks is
difficult, and real-world deployment of CL models is limited by their inability
to measure predictive uncertainties. To address these issues, we propose
handling CL tasks with neural processes (NPs), a class of meta-learners that
encode different tasks into probabilistic distributions over functions all
while providing reliable uncertainty estimates. Specifically, we propose an
NP-based CL approach (NPCL) with task-specific modules arranged in a
hierarchical latent variable model. We tailor regularizers on the learned
latent distributions to alleviate forgetting. The uncertainty estimation
capabilities of the NPCL can also be used to handle the task head/module
inference challenge in CL. Our experiments show that the NPCL outperforms
previous CL approaches. We validate the effectiveness of uncertainty estimation
in the NPCL for identifying novel data and evaluating instance-level model
confidence. Code is available at \url{https://github.com/srvCodes/NPCL}.Comment: Accepted as a poster at NeurIPS 202
An Introduction to Lifelong Supervised Learning
This primer is an attempt to provide a detailed summary of the different
facets of lifelong learning. We start with Chapter 2 which provides a
high-level overview of lifelong learning systems. In this chapter, we discuss
prominent scenarios in lifelong learning (Section 2.4), provide 8 Introduction
a high-level organization of different lifelong learning approaches (Section
2.5), enumerate the desiderata for an ideal lifelong learning system (Section
2.6), discuss how lifelong learning is related to other learning paradigms
(Section 2.7), describe common metrics used to evaluate lifelong learning
systems (Section 2.8). This chapter is more useful for readers who are new to
lifelong learning and want to get introduced to the field without focusing on
specific approaches or benchmarks. The remaining chapters focus on specific
aspects (either learning algorithms or benchmarks) and are more useful for
readers who are looking for specific approaches or benchmarks. Chapter 3
focuses on regularization-based approaches that do not assume access to any
data from previous tasks. Chapter 4 discusses memory-based approaches that
typically use a replay buffer or an episodic memory to save subset of data
across different tasks. Chapter 5 focuses on different architecture families
(and their instantiations) that have been proposed for training lifelong
learning systems. Following these different classes of learning algorithms, we
discuss the commonly used evaluation benchmarks and metrics for lifelong
learning (Chapter 6) and wrap up with a discussion of future challenges and
important research directions in Chapter 7.Comment: Lifelong Learning Prime
- …