44 research outputs found
Lifelong Neural Predictive Coding: Learning Cumulatively Online without Forgetting
In lifelong learning systems, especially those based on artificial neural
networks, one of the biggest obstacles is the severe inability to retain old
knowledge as new information is encountered. This phenomenon is known as
catastrophic forgetting. In this article, we propose a new kind of
connectionist architecture, the Sequential Neural Coding Network, that is
robust to forgetting when learning from streams of data points and, unlike
networks of today, does not learn via the immensely popular back-propagation of
errors. Grounded in the neurocognitive theory of predictive processing, our
model adapts its synapses in a biologically-plausible fashion, while another,
complementary neural system rapidly learns to direct and control this
cortex-like structure by mimicking the task-executive control functionality of
the basal ganglia. In our experiments, we demonstrate that our self-organizing
system experiences significantly less forgetting as compared to standard neural
models and outperforms a wide swath of previously proposed methods even though
it is trained across task datasets in a stream-like fashion. The promising
performance of our complementary system on benchmarks, e.g., SplitMNIST, Split
Fashion MNIST, and Split NotMNIST, offers evidence that by incorporating
mechanisms prominent in real neuronal systems, such as competition, sparse
activation patterns, and iterative input processing, a new possibility for
tackling the grand challenge of lifelong machine learning opens up.Comment: Key updates including results on standard benchmarks, e.g., split
mnist/fmnist/not-mnist. Task selection/basal ganglia model has been
integrate
Continual Learning via Sequential Function-Space Variational Inference
Sequential Bayesian inference over predictive functions is a natural
framework for continual learning from streams of data. However, applying it to
neural networks has proved challenging in practice. Addressing the drawbacks of
existing techniques, we propose an optimization objective derived by
formulating continual learning as sequential function-space variational
inference. In contrast to existing methods that regularize neural network
parameters directly, this objective allows parameters to vary widely during
training, enabling better adaptation to new tasks. Compared to objectives that
directly regularize neural network predictions, the proposed objective allows
for more flexible variational distributions and more effective regularization.
We demonstrate that, across a range of task sequences, neural networks trained
via sequential function-space variational inference achieve better predictive
accuracy than networks trained with related methods while depending less on
maintaining a set of representative points from previous tasks.Comment: Published in Proceedings of the 39th International Conference on
Machine Learning (ICML 2022
Online Continual Learning on Sequences
Online continual learning (OCL) refers to the ability of a system to learn
over time from a continuous stream of data without having to revisit previously
encountered training samples. Learning continually in a single data pass is
crucial for agents and robots operating in changing environments and required
to acquire, fine-tune, and transfer increasingly complex representations from
non-i.i.d. input distributions. Machine learning models that address OCL must
alleviate \textit{catastrophic forgetting} in which hidden representations are
disrupted or completely overwritten when learning from streams of novel input.
In this chapter, we summarize and discuss recent deep learning models that
address OCL on sequential input through the use (and combination) of synaptic
regularization, structural plasticity, and experience replay. Different
implementations of replay have been proposed that alleviate catastrophic
forgetting in connectionists architectures via the re-occurrence of (latent
representations of) input sequences and that functionally resemble mechanisms
of hippocampal replay in the mammalian brain. Empirical evidence shows that
architectures endowed with experience replay typically outperform architectures
without in (online) incremental learning tasks.Comment: L. Oneto et al. (eds.), Recent Trends in Learning From Data, Studies
in Computational Intelligence 89
Evaluating k-NN in the Classification of Data Streams with Concept Drift
Data streams are often defined as large amounts of data flowing continuously
at high speed. Moreover, these data are likely subject to changes in data
distribution, known as concept drift. Given all the reasons mentioned above,
learning from streams is often online and under restrictions of memory
consumption and run-time. Although many classification algorithms exist, most
of the works published in the area use Naive Bayes (NB) and Hoeffding Trees
(HT) as base learners in their experiments. This article proposes an in-depth
evaluation of k-Nearest Neighbors (k-NN) as a candidate for classifying data
streams subjected to concept drift. It also analyses the complexity in time and
the two main parameters of k-NN, i.e., the number of nearest neighbors used for
predictions (k), and window size (w). We compare different parameter values for
k-NN and contrast it to NB and HT both with and without a drift detector (RDDM)
in many datasets. We formulated and answered 10 research questions which led to
the conclusion that k-NN is a worthy candidate for data stream classification,
especially when the run-time constraint is not too restrictive.Comment: 25 pages, 10 tables, 7 figures + 30 pages appendi
A Bi-Criteria Active Learning Algorithm for Dynamic Data Streams
Active learning (AL) is a promising way to efficiently
building up training sets with minimal supervision. A learner
deliberately queries specific instances to tune the classifier’s
model using as few labels as possible. The challenge for streaming
is that the data distribution may evolve over time and therefore
the model must adapt. Another challenge is the sampling bias
where the sampled training set does not reflect the underlying
data distribution. In presence of concept drift, sampling bias is
more likely to occur as the training set needs to represent the
whole evolving data. To tackle these challenges, we propose a
novel bi-criteria AL approach (BAL) that relies on two selection
criteria, namely
label uncertainty criterion
and
density-based cri-
terion
. While the first criterion selects instances that are the most
uncertain in terms of class membership, the latter dynamically
curbs the sampling bias by weighting the samples to reflect on the
true underlying distribution. To design and implement these two
criteria for learning from streams, BAL adopts a Bayesian online
learning approach and combines online classification and online
clustering through the use of
online logistic regression
and
online
growing Gaussian mixture models
respectively. Empirical results
obtained on standard synthetic and real-world benchmarks show
the high performance of the proposed BAL method compared to
the state-of-the-art AL method
Boosting Classifiers for Drifting Concepts
This paper proposes a boosting-like method to train a classifier ensemble from data streams. It naturally adapts to concept drift and allows to quantify the drift in terms of its base learners. The algorithm is empirically shown to outperform learning algorithms that ignore concept drift. It performs no worse than advanced adaptive time window and example selection strategies that store all the data and are thus not suited for mining massive streams. --